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In Memoriam 


Ned Seeman, whose scientific vision and accomplishments this volume was 
conceived to honor, passed away while this book was in the late stages of prepa- 
ration. It is a minor sadness that he will not see the completion of this book; it is 
a considerable sadness that he will not continue to oversee the development of the 
field that he created; it is a great sadness that he will continue to guide and inspire 
his friends and colleagues only in their memories. 

As a tribute to Ned, and as a way to remind the upcoming generation of his 
influence, we (Erik and Nataša) organized a panel discussion at the 28th Interna- 
tional Conference on DNA Computing and Molecular Programming held on August 
11, 2022. The following consists of combination of notes prepared for the panel, 
recollections of what was said, and revisions reflecting what the participants wished 
they had said—an imperfect transcript. It is followed by a series of vignettes and 
remembrances from a few of Ned’s colleagues and friends. (We wish we could have 
included more!) 


Panel Discussion 


Moderator (Erik Winfree): Ned Seeman passed away November 16, 2021. He was 
a guiding light of our community since the very first DNA computing conference 
in 1995 and had been building the foundations of DNA nanotechnology for decades 
before that. It is sad to acknowledge that the younger members of our community will 
not have the chance to chat with him during coffee breaks, hear his colorful invective 
first-hand, and be inspired by his scientific creativity and dedication to truth. Today’s 
panel discussion is meant as a tribute to his life and science—to recognize what we 
have lost and to celebrate what we have gained. 

For many of us here today, Ned needs no introduction. But for those whose 
intersection with this community is more in the future than in the past, I would like 
to give a brief biography. Ned was born a few months after the end of World War 
II. He was an undergraduate at the University of Chicago, where he transitioned 
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from being a medical student to focusing on biochemistry. As a graduate student at 
the University of Pittsburgh he settled on crystallography, which he pursued further 
as a postdoc, first at Columbia, and then at MIT where he worked with Alex Rich. 
Despite doing spectacular work with Alex on transfer RNA (Science 1974) and on 
sequence-specific recognition of DNA by proteins (PNAS 1976), he was not one 
of the lucky ones when looking for a faculty position. He started as an assistant 
professor at SUNY Albany in 1977—for what he later called his “Albanian exile”. 
Another phrase he coined about that time was the inevitable sequence: “No crystals, 
no crystallography, no crystallographer”. That was basically the way his career was 
going by 1980 when he was contemplating the prospects of tenure, or lack thereof, 
having no graduate students and, as he said, “having crystallized nothing of great 
importance”. The invention of DNA nanotechnology later that year was really a case 
of snatching victory from the jaws of defeat. Still, his 1982 paper in Theoretical 
Biology—the one that sketched out his ideas for designing multi-branched DNA 
structures that could self-assemble into rationally designed crystals—was largely 
ignored for decades, despite his 1983 paper in Nature demonstrating experimentally 
that immobile branched junctions can be designed and synthesized. How’d he get 
that done, with a basically empty lab? He did whatever he had to. He collaborated 
with a postdoc at the University of Pennsylvania, who later helped him move to New 
York University in 1988, where he stayed for the rest of his career. 

Why do I emphasize this particular period of what might be considered prehistory, 
when the really exciting advances in DNA nanotechnology—which Ned participated 
in equally—came much later as the field started to flourish? After all, when Ned 
was awarded the Kavli Prize in Nanoscience in 2010, the citation lauded him for 
“pioneering the use of DNA as a non-biological programmable material for a count- 
less number of devices that self-assemble, walk, compute, and catalyze”. There are 
several reasons. Perhaps the main one is that this period helped forge Ned’s scientific 
character, and thus it helps for understanding him and his science. Another is that it 
reminds us to be thankful to him, for coming through rather than giving up. A third 
reason is perhaps as a message to the younger people here, that yes, not all careers 
in science get the lucky breaks, and no, it is definitely not fair, but much of what we 
are is determined by how we respond to that. Ned modeled one way. 

This panel will remind us of some aspects of that way. We’ll organize the discus- 
sion into comments from the panelists on 3 broad topics, and then open the floor for 
participation from the audience. The 3 topics are (1) His science. A sampling of Ned’s 
papers that touched us somehow. (2) His person. Remembrance of his influence on us 
through mentoring and inspiration. (3) His legacy. The field of DNA nanotechnology 
has come so far in the 40 years since Ned envisioned the foundations, so where is it 
going in the next 40 years? 

OK, before we get going, I’d like to introduce the panelists. We were fortunate 
to get people with a range of perspectives and backgrounds. I am Erik Winfree, 
from Caltech. I was a graduate student with John Hopfield at Caltech, but collabo- 
rated with Ned during that time. I look at the world through the lens of a theoretical 
computer scientist and natural philosopher. From left to right, the other panelists 
are: Hao Yan, from Arizona State University. He was a chemistry graduate student 
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with Ned, then a postdoc with John Reif’s group in computer science at Duke. His 
contributions to the field are too numerous to mention, and among the most beau- 
tiful—his group’s demonstration of 3D DNA origami with curved shapes, such as a 
nanoscale vase, comes to mind. Tom Ouldridge, from Imperial College London. He 
was a physics graduate student with Ard Louis and Jon Doye at Oxford, where he 
developed the oxDNA model that has proven to be the premier simulator for DNA 
nanotechnology. His research has expanded dramatically into fundamental questions 
about biophysics, circuits, and molecular machines. Simon Vecchioni, from NYU. 
He did his Ph.D. in biomedical engineering with physicist Shalom Wind at Columbia 
University, where he looked at the conductivity of DNA-based nanowires, and he is 
currently a postdoc in the Seeman group, where he has been straddling fields. Shelley 
Wickham, from the University of Sydney. She was a graduate student in physics with 
Andrew Turberfield at Oxford, where she worked on DNA-based motors and robots, 
and then a postdoc with William Shih at Harvard, where she developed modular 
barrel-shared DNA origami. Her current research includes interfacing origami with 
biological membranes. Nataša Jonoska, the professor of mathematics at the Univer- 
sity of South Florida and frequent collaborator with Ned, was supposed to be with 
us today as well, but unfortunately at the last minute she was not able to attend the 
conference. She is the main reason this panel happened. In a few places later on, I 
will read a few comments she sent by email. 

Now, let’s dive into the first topic: Papers by Ned. Hao will go first. 

Hao Yan: My name is Hao Yan, a former Ph.D. student of Ned; I graduated 
in 2001 from Ned’s lab. My favorite paper of his is the 1993 Biochemistry paper 
titled “DNA double crossover molecules”, in which he and his student Tsu Ju Fu 
reported and characterized a series of DNA molecules containing two crossover sites 
between helical domains. These include both antiparallel and parallel DNA double- 
crossover molecules. These DNA constructs laid the foundation for future works of 
structural DNA nanotechnology, including Seeman and Winfree’s 1998 Nature paper 
on designed two-dimensional DNA arrays using antiparallel DNA double-crossover 
molecules, and later Rothemund’s DNA origami, and many more. In fact, in many 
of Ned’s earlier papers he proposed ideas that later were realized and expanded by 
colleagues in our field. Examples include the use of meso-junction and anti-junction 
to create reconfigurable cascading arrays, replicable single-stranded nanostructures, 
to name a few. 

Tom Ouldridge: Early in my Ph.D., Ned Seeman visited Oxford to give a talk. At 
the time I was deep into developing oxDNA and very focused on using it to simu- 
late self-assembly. He talked about Omabegho’s motor: “A bipedal DNA Brownian 
motor with coordinated legs” (2009). The idea was to create a DNA-only system 
(no cheating with enzymes, Shelley) that would walk along a track. I remember 
being inspired by the idea of creating active, fuel-consuming machines like those in 
biology. The audacity of rationally engineering molecular machines, where chemical- 
free energy is carefully channeled into mechanical action at a microscopic scale, still 
blows me away. Indeed, I think an underappreciated aspect of biology is just how 
good it is at applying chemical-free energy in a standard form to achieve essentially 
any goal. 
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I also realized that this was an area that oxDNA could be immediately applied 
to. Nowadays the model is mostly used for simulating large structures, but at the 
time the opportunity to say something useful about such small, intricate systems was 
invaluable. Helpfully for me, Andrew Turberfield’s lab, just round the corner, was 
also very interested in molecular machines, and simulating those machines and the 
underlying displacement motif generated oxDNA’s first useful results. 

Simon Vecchioni: [have two favorite papers by Ned. The first is “DNA Nanotech- 
nology at 40” (2020), his perspective in Nano Letters. It is a short, non-technical read, 
and in it, Ned highlights that he is often asked what is next in the field. In published, 
academic text, he writes: “I DON’T KNOW”. I think this captures his acerbic and 
brilliant wit fairly completely. 

More broadly, I am drawn again and again to “Designed Two-Dimensional DNA 
Holliday Junction Arrays Visualized by Atomic Force Microscopy” (1999), a JACS 
paper authored by Chengde Mao. As a graduate student in a nanotechnology and 
nanofabrication lab, DNA nanotech was a fascination for all of us, but the method- 
ology hadn’t been trained into our bones the way it had been for Ned’s group. At 
the time, this paper felt like a fount of knowledge—it is laid out well, identifies the 
design rules, and enumerates the composable parts. It’s a roadmap for the aspiring 
DNA nanotechnologist and presents a means of creating and analyzing arbitrary 
DNA constructs from simple parts. Integer numbers of helical turns, stable immo- 
bile Holliday junctions, sequence design, and periodic windows are visualized by 
atomic force microscopy—it’s all there. Engineering new chemistry and swapping 
components within the parallelogram arrays in this paper felt accessible and doable. 
For anyone looking to enter the field as I was, this paper presents an excellent launch 
point. 

Shelley Wickham: My favorite paper of Ned’s is “A proximity-based 
programmable DNA nanoscale assembly line” (2010)—a Nature paper with first 
author Hongzhou Gu. This paper came out while I was in my Ph.D., and I was 
really impressed by the ambition. It had DNA walkers using strand displacement 
and DNA origami and cargo loading and assembly and AuNPs and it worked. It also 
had the most lurid color scheme for AFM images I have ever seen. A tripod DNA 
“walker” rolls along a track, picking up and putting down feet with toehold mediated 
strand displacement. At each of 3 sites along the track, a depot tile or “cassette” sits 
pre-loaded with different types of AuNPs. Depending on the trigger DNA strand the 
cassette swings the cargo over, which then gets added to the motor. Without a trigger, 
no cargo is loaded at that depot. The paper shows they produce 8 different structures 
at the end, depending on whether the cassette was switched on or off at each step. 
The system requires a total of 11 sequential switches. 

This paper is still very cool and is something people are still trying to do—bring 
together DNA circuits with nanostructures to make assembly lines that construct 
other materials. 

Erik Winfree: My first encounter with Ned’s work came in early 1995, when I 
met with Paul Rothemund—then not even a graduate student yet—to discuss Len 
Adleman’s paper on DNA computing. Paul had been working on how to build Turing 
machines with restriction enzymes, as an undergraduate class project, and he had a 


In Memoriam xi 


huge stack of photocopied papers for me to look at, mostly by Ned. It’s hard to 
choose a specific paper to talk about for this panel, as I read many of his papers line 
by line at that time, and they all influenced me deeply, and I’ve read many since then 
and can say the same. But the one I’ve chosen is “De novo design of sequences for 
nucleic acid structural engineering” (1990). This wasn’t his first paper on sequence 
design, but it was the most detailed one available to study when I was designing my 
first double-crossover molecules as a graduate student, so it made a real impression 
on me. To be honest, so did his earlier 1983 paper, “Design of immobile nucleic 
acid junctions”, where he described how he designed the sequences for his first 
experimental demonstration of an immobile 4-way junction. It’s amazing! Ned had 
practically written a little mini-proto-NUPACK! It used base-pairing constraints to 
specify the target multi-stranded structure; it used test tube concentration calculations 
to determine the fraction of target assembly as a function of temperature, to maximize 
stability; it used partition function calculations to maximize the fidelity of specific 
binding and minimize spurious binding—all reminiscent of what Niles Pierce calls 
positive and negative design principles. Only, it was written in FORTRAN and ran 
on a Univac 1100/82, and it didn’t really scale up beyond the 64-nt four-armed 
junction that they designed. So what did the 1990 paper do? Basically, it described 
what Ned learned for how to scale up as needed for the experimental project they 
were in the middle of at the time: the DNA wire-frame cube that you all do or 
should love. Since Ned didn’t have the algorithmic chops that Niles later brought to 
bear on the problem, instead of finding a better algorithm to more efficiently solve 
the original design criteria, Ned’s approach now was to identify a set of simpler 
criteria that were “good enough’”—at least for his purposes. These criteria centered 
on sequence symmetry minimization—trying to minimize the length of repetitive or 
complementary subsequences. What was exciting is that these criteria generalized 
naturally to not just junctions, but essentially arbitrary DNA constructs based on 
Watson-Crick pairing, and were susceptible to basic optimization algorithms. The 
field of course has come a long way since Ned’s early forays into sequence design, but 
even today many DNA systems are designed based on the frameworks he established 
40 years ago. 

There’s another take-away that I find to be really interesting. Ned is often called a 
chemist, for obvious reasons, so it’s easy to forget that he was also a coder, a computer 
programmer. He wrote a lot of programs: for crystallography, for simulating branch 
migration, for computer graphics, and for sequence design. I think it’s no accident 
that identifying DNA as a uniquely programmable molecule was the insight of a 
computer programmer. And I also think it’s no accident that the magic that powered 
the birth of this field was unleashed by a person whose understanding and capability 
spanned from algorithm design to experimental investigation, where all these aspects 
of reality could get tangled up with each other. 

Nataša Jonoska: I heard Ned’s talk for the first time at the second DNA computing 
conference in Princeton in 1996. He showed the infamous image of the cube, from 
“Synthesis from DNA of a molecule with the connectivity of a cube” (1991). lama 
mathematician and did not know too much about biochemistry, and my knowledge 
of biochemistry was at the level of freshman biology. Constructing a 3D structure 
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that self-assembles through careful sequence design and predictable topology was 
completely off the wall for me. It was an inspiration that made me think about 
3D structures and to design 3D DNA graphs that we later published together. The 
subject opened up several new mathematical problems in graph theory (study of new 
invariants) and topological graph theory (new approaches in graph embedding in 
surfaces). Those early days of DNA computing were filled with various approaches 
to solve NP problems (often related to questions about graphs). Our first approach was 
to envision a variation of DX to solve the Hamiltonian problem through assembling 
the graph structure itself. When I emailed Ned whether this variation of DX was 
feasible, he said “make a model”. I brought the model to the next DNA computing 
meeting and was thrilled to get a nod of approval from him. [Editor’s note: Here 
Natasha is not talking about a mathematical model or a computer simulation model, 
but rather something more like a tangle of metal jacks and plastic tubing—a physical 
thing you can get your hands on and feel.] 

Moderator: Great. The second topic concerns Ned as a mentor, visionary, and 
human being. Again, Hao will go first. 

Hao Yan: Ned was a great mentor, a great friend, and a great inspiration. Ned left 
a legacy that will last: not only did he single-handedly found the field of structural 
DNA nanotechnology in 1982, a field many of us are enjoying working in, a field 
that has become highly disciplinary with lots of creative minds working together 
to give DNA unique capabilities to build, sense, compute, and actuate, but he also 
founded the International Society for Nanoscale Science, Computation and Engi- 
neering (ISNSCE) and served as the founding president of the society. In Ned’s 
vision, the society will promote the study of the control of the arrangement of the 
atoms in matter, examine the principles that lead to such control, to develop tools 
and methods to increase such control, and to investigate the use of these principles 
for molecular computation and for engineering on the finest possible scales. On a 
personal note, there have been many things I have learned from Ned. Two important 
characters that have impacted me are: persistence and supporting students. Ned never 
gives up, and he once told us “if something doesn’t work, use a hammer, if it still 
doesn’t work, use a bigger hammer”. Ned has been very supportive of his students 
and junior colleagues in the field. I was serving on the faculty search committee in 
my department these days and have had the chance to read some of Ned’s recommen- 
dation letters for his students. Ned’s letter can be 5 pages long, and he inserts figures 
in the letter and really elaborates on what the students’ contribution to the project 
were. I have never seen letters from other people with figures; Ned really spent time 
working on his letters to support us. I nowadays do the same to my own students, 
putting my heart into supporting them and promoting their academic careers. Ned and 
I have developed from a mentor-student relationship into more of a close personal 
friendship. I remember, 5 years ago, Ned was trapped in an ICU due to illness in 
Nanjing while he was traveling in China and India. I emailed him and asked what I 
can help, he emailed back and said “get me out of the jail”. For Ned it was torture 
that he lost freedom and could not work on papers and proposals. Ruojie and I flew 
over to China immediately and tried to get him out, but the doctors wouldn’t release 
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him. I witnessed how vulnerable a big giant can be and how desperate Ned was when 
he was isolated from work, like a fish out of water. 

Tom Ouldridge: There are two things Pd like to recognize. The first is that, to me 
at least, Ned’s actions in founding the field seem to epitomize an attitude of “if we 
really believe that DNA has the properties we say it does, we should be able to do 
these amazing things with it’. I think this is a good maxim that can be applied more 
broadly across scientific endeavors. It drives us to be ambitious, and it also drives us 
to build things that actually test and extend the limits of our scientific understanding. 

Secondly, Ned coined—or at least used for the first time in my hearing—a term that 
I think describes what I (and many others) do. Since we build functional biomolecular 
systems from scratch, I’ve always thought that “synthetic biologists” is a better 
description of us than of the people who build transcription factor networks in cells. 
But to be fair, they should have been called genetic engineers, and that term was 
already taken. But I once heard Ned describe some of his work as “kleptobiology”— 
klepto being derived from the ancient Greek word for “theft”. And that’s a brilliant 
summary of how we steal useful biological molecules and apply them wildly out of 
context for our own purposes. 

Simon Vecchioni: Nanoscience as a larger field is full of industrial partners and big- 
name physicists and chemists. Fitting into this web of well-established and competi- 
tive giants is a rather lonely affair. After joining Ned’s lab as a postdoc, I find myself 
a part of a family I never expected to have. There’s a sort of crusty human warmth 
to the whole field that is surprising and beautiful. You can hear everyone’s stories 
and trials and efforts and aspirations, and it all intersects back to Ned, to his wild but 
methodical vision. 

Ned was a man without pretense: he lived as himself, every day. He made the 
same walk to the lab, asked the same hard questions, and had a gigantic hidden 
soft spot for other people, and he loved the art of science—no matter his health or 
condition. When faced with challenges, he would often tell stories that fell squarely 
as non-sequitur, but had some roundabout hidden insight that would point you where 
you were going. He staunchly defended the intellectual property of figures in the 
field and would tolerate no theft of ideas. To a fault, he made sure that authorship 
remained with inventors and contributors; this current still runs through our field 
with its strong spirit of collaboration and uplift, rather than fierce competition and 
infighting. He was a light to all, and the character and tenor of this community—this 
ad hoc family—is the evidence. 

Shelley Wickham: I first spoke to Ned at a GRC in nanofabrication 2010 in NH, 
as a Ph.D. student when he came up to my poster on DNA motors—just after the 
paper I described above came out. When he first came up to my poster, I was a bit 
intimidated. He took a long pause and my anticipation built up, and then his first 
words were...“deus ex machina”. It was not the reaction I was expecting! I now 
know this phrase refers to “a person or thing (as in fiction or drama) that appears 
or is introduced suddenly and unexpectedly and provides a contrived solution to an 
apparently insoluble difficulty”. At the time, I had no idea what it meant. Ned did 
explain it to me though, my work used a DNA motor powered by a nicking enzyme. 
We ended up having a great chat, and I visited his lab in NYU after the conference. 
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He was advocating the elegance of DNA only systems, as opposed to the “god-like” 
power of enzymes. Although through great work like the PEN toolbox and genelets, 
the field is getting better at harnessing these god-like properties of enzymes. Like 
others, I found Ned was always very open and engaged with people at all levels, 
especially students and postdocs. He was honest about academic careers—that it is 
hard, but it is worth it to be ambitious, and it’s ok to have a messy office. The last 
time I spoke to Ned was DNA25 (2019) in Seattle. Despite having recently been very 
ill, he was still ambitious and talking about the next grant, and the next project in a 
way that I found very inspiring. 

Erik Winfree: I first met Ned when I was a graduate student at Caltech, and he’s a 
big reason why I am still at Caltech. I met him in Princeton, actually, when we both 
attended the first conference on DNA computing, in 1995. At the time I considered 
myself a mathematician and computer scientist, and only a few months earlier I had 
learned about Ned’s work from my friend Paul Rothemund, who was more familiar 
with chemistry. As part of my talk at the conference, I discussed my embryonic ideas 
for how Ned’s DNA nanotechnology work could be used as the foundation for a way 
to compute by molecular self-assembly. I was super surprised to discover that Ned 
was in the audience and super excited that he invited me, Paul, and a few others to 
get a beer at the pub afterward. That’s where he told me my ideas were unlikely to 
work—not with the molecules I was envisioning, at least. He also told me what he 
thought was the fix: a different flavor of DNA molecule called the double crossover 
that his group had recently invented and characterized. So he invited me to visit his 
lab over the summer, where he trained me as an experimental chemist. A few years 
later, we had a paper in Nature. 

Ned’s efforts to help young people become successful in science is a common 
story. His heart and his actions both pushed that way. But more than that, his example. 
He is someone who illustrates what it means to have a vision, to see things that others 
don’t see—or don’t see the value of—and to pursue it for decades in near-isolation. 
Because that’s what it takes. It takes a strength of character and a belief in oneself 
that very few people have. So I learned from Ned what it means to bring something 
really new to science: it is to value things that others don’t (yet) value, to care about 
things that others don’t (yet) care about—that is where one can contribute something 
truly unique, something better than just being able to run the race faster and be first 
to a well-recognized goal. Of course, it might also mean that one doesn’t have an 
easy job getting funding or recognition. I am really glad that, ultimately, Ned has 
been recognized, not just by his friends and close colleagues, but by the broader 
scientific community, as the visionary and genius that he truly is. He will be missed 
and remembered. I think of the “soul” as the collection of recurrent patterns and 
algorithm snippets that characterize a person’s behavior, so I mean it quite literally 
when I say that fragments of Ned’s soul will live on in all the science and in every 
scientist that he touched. 

Nataša Jonoska: Those of us who knew him can attest to his unique personality, 
the hidden kindness, and grand humanism. He offered to host me in his lab for my 
first sabbatical; so I spent two semesters in his lab. It was a super high learning curve 
for me as I had not done any lab work prior to that. But that was probably the best 
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professional (at the time very hard) decision. It turned out to be a great personal 
decision as well. It was 2001, we started a collaboration that transformed into a deep 
friendship that lasted 20 years. I was privileged to know him. His perseverance in 
getting the 3D DNA crystal materialized only in 2009. Lots of other results were 
happening in the lab during that time. Many great papers got published, but his 
persistence and ultimate vision was the main driving force; it brought new ideas and 
pushed students and junior researchers to think in a unique way and to generate lots 
of productive discussions. We should learn from his example: in science one must 
have a grand vision and a goal as a motivation. There are often disappointments in 
science, but one must see those as temporary setbacks. Also, a colorful language can 
sometimes ease up the frustration. 

Moderator: Let’s move on to the third and final topic: Where is the field going 
and how might it develop? 

Hao Yan: Future directions of our field that I expect will be important include the 
following: (1) Molecular machines of the future will entail the combination of rational 
design and directed evolution to develop molecular machines driven not only by DNA 
strand displacement, pH, or light but by chemical fuels like those used by nature’s 
molecular motors; (2) universal design platforms that integrate design and modeling 
tools from DNA, RNA, and protein design to create ribonucleoprotein complexes, an 
exciting example of which would be the design of an artificial ribosome; (3) structural 
control at the atomic level, as Ned emphasized when he founded ISNSCE. 

Tom Ouldridge: I am a natural pessimist in this sense and usually wrong about 
what has potential. It is interesting to note that I’m seeing more companies pop up 
that want to make use of origami or strand displacement technology—in many cases 
using side products of the scientific research. But scientifically, I think a move toward 
systems that can operate autonomously over long timescales is an important area to 
develop, if we want to interface with biology or build synthetic life-like systems. 

Simon Vecchioni: To get any part of structural DNA nanotechnology to work 
at all, Ned painstakingly built the field from its theoretical foundations up. The 
structure-function relationship of biological systems was a powerful undercurrent of 
the early works, and the helical twist to emergent topology relationship was rehashed 
again and again. Each subsequent step forward in the architectural expansion of 
DNA nanotechnology was accompanied by a well-elucidated structural toolkit. 
Though Ned staunchly claimed not to know what was coming, he had a strong 
proclivity toward expanding the chemical design language of DNA. Metal base pairs, 
guanine tetraplexes, modified backbones—each of these components required a new 
understanding of structure to enable the growth of new functions. 

The goals of DNA nanotech, surprisingly, have more or less been clear since the 
beginning: (1) engineer soft matter into designed architectures; (2) use these archi- 
tectures to organize biological and/or nanoscale components for structural and func- 
tional studies; and (3) generate emergent (nanoelectronic or otherwise) properties 
from DNA shapes. To this end, it has become clear that DNA is fantastic as a building 
material, but lacks some of the physical pizazz of other nanoscale components. In 
engineering functional nanotechnologies, the field will need to carefully carve out 
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new structure-function relationships to equip DNA and DNA nanotechnologists to 
harness novel optical, electronic, computational, and mechanical properties. 

Ned’s lab today is deeply committed to expanding DNA nanoscience “Beyond 
Watson and Crick”, and by that we mean adding new base pairs, new backbones, new 
secondary and tertiary interactions, and overall (carefully) breaking the design rules 
that have become so fundamental and dear to us. Per Ned’s tried and true method, 
structure comes first. As we step into new chemistry, we imagine and hope for an 
expanded DNA design language that can encompass the deep and far-seeing vision 
that Ned had for this field. 

Shelley Wickham: This ties back to Ned’s ambitious 2010 paper, which influenced 
me a lot. I see in the future more work on bringing structures and circuits together 
to use DNA as a dynamic scaffolding tool to make other things. These are still hard 
goals, and as a field we are still not great at them but there is lot of recent progress 
that is really exciting. Could we make molecular 3D printers? Crystal lattices to 
template optical properties? Reconfigurable responsive materials/structures? Have 
kinetic control of assembly? Those are some future directions I am excited by. 

Erik Winfree: In the early days of DNA nanotechnology, Ned often emphasized that 
the ultimate goal is control. The masterful scientist should know EXACTLY what is 
in the test tube, should be able to specify exactly, like Angstrom-level exactly, where 
each atom is. This was the crystallographer in him speaking. But DNA nanotech- 
nology rarely reaches that pinnacle of control, especially in the hands of neophyte 
newcomers from fields like mathematics and computer science, such as myself and 
many others in this room. There are synthesis errors, unintended folds, and stuff 
sticking everywhere, and even a covalently perfect molecule will warp and wobble 
in the strangest ways. This can hardly be considered a good thing, but nonetheless 
I started to get the impression that Ned was coming to the view that being able to 
do a great many different things, imperfectly, was also of considerable merit, even if 
perfect control is forsaken. I’d like to use that position as a starting point for where 
DNA nanotechnology and molecular programming might be going. 

If you examine two crystals of, say, diamond, they will be identical to each other 
down to the Angstrom level, differing only in relatively minor ways such as overall 
size and shape, or the locations of defects. They’re basically superimposable. In 
contrast, consider two individual bacterial cells, say E. coli: no attempt to rotate 
and translate one to superimpose on the other will match more than a handful of 
atoms. The level of randomness in biological cells is astounding. Where are the 
chromosomes? Where are the enzymes? The ion channels? The flagella? And it’s all 
moving about and rearranging constantly. Of course, many other things are nearly 
precisely constant and reliable—the sequence of the DNA paramount among them. 
But the point is that it can be hard even to draw the dividing line between what is 
precisely controlled and what is left to vary in every which way. Yet, that may be 
what we need to figure out how to do. I believe that it is only a matter of decades 
before DNA nanotechnology scales up to constructing molecular systems as large 
and as complex as biological cells, and that the biggest bottleneck to doing so will be 
knowing how to design such systems to be functional without over-specifying and 
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trying to control the parts and aspects of the system that do not need to be controlled— 
those things are better left to be random. It is my hope that design paradigms that 
reflect this distinction can be developed for DNA nanotechnology, and it is my guess 
that successful approaches will deeply borrow concepts from neural computation, 
machine learning, and constraint satisfaction programming: define the goals, not the 
details. 

Moderator: That’s it for the pre-arranged portion of this panel. For those of you 
who knew Ned, I hope it helped you remember him more deeply for a while. For 
those of you who didn’t know him, I hope it inspired you with the desire to take 
some time to get to know him better. Read his papers. Find out about his life. A good 
place to start for the latter is the autobiography that’s on the Kavli Prize web page 
or the related memoir that appeared in the American Crystallographic Association 
in 2014. You can also find online some extensive interviews by other scientists and 
by historians such as Patrick McCray, or, of course, obituaries such as the one by 
Phillip Ball in Nature. 


Vignettes 


Jim Canary: Ned’s research and DNA nanotech expanded greatly as other struc- 
tural tools (besides X-ray crystallography) such as AFM became available. The 
early designs, such as characterization of the DNA cube, required brilliant control 
experiments carefully designed and executed. Those experiments could not confirm 
structure but only topology. At the time of his passing, he was eager to learn more 
about cryo-electron microscopy. Make no mistake, though, that Ned’s first love was 
X-ray crystallography. Ned was the most focused scientist that I have met. One day 
in his office I noticed that there were more than 15 cheap umbrellas scattered every- 
where. When I asked about this, he said that when he leaves and finds out that it is 
raining, he buys an umbrella from one of the people selling them on the street. The 
point being that when he was leaving the office, he was not checking or thinking 
about the weather—or even looking out the window—he was still thinking about his 
work. 

Paul Chaikin: Ned was a larger-than-life character who stood out even among the 
other characters that haunted the East Village, Chinatown, and Washington Square, 
places he enjoyed much more in the rough old days before they got gentrified. I 
first met Ned when I organized the Condensed Matter Physics Gordon Conference 
in 2001 with no theme but the idea that it would be great gather the most creative 
people. Albert Libchaber at Rockefeller told me “you have to invite Ned”. I picked 
Ned up at the train station in New London, he wanted to get a beer, and we spent the 
evening talking about DNA, knots, cubes, and why the hell, physicists couldn’t find a 
way to “see’’/prove his remarkable structures. I was sold and we started collaborating 
later that year. He’s one of the reasons I relocated to NYU. When I got there, we 
started a group to see how difficult it would be to make our own non-living system 
that could self-replicate. From then on, the Self-Replication Group, SRG, met every 
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other week in his office, walking and sitting around seemingly random piles of old 
papers and books, dodging flimsy models of his structures hanging from the walls 
and ceiling. Amazingly, when Ned wanted to make a point, after a flurry of language 
reminiscent of my Brooklyn youth (Ned probably got it from Chicago’s streets) he 
could always find the appropriate paper or toy. It was a great collaboration. Every 
crazy idea I had, Ned either had done it, knew how to do it, or figured it out by 
the next meeting. Ned was brilliant. The students and postdocs, who survived (and 
enjoyed) the spicy (putting it mildly) rants, loved Ned and learned how science was 
really done. Ned and I went to many meetings together, often by car, and I heard in 
typical Ned salt about the incompetence of all administrators and some colleagues. I 
heard many of his talks. He always warned me they were the same talk. I must have 
seen the slides “DNA: Not merely the secret of life” and Escher’s “Depth” a hundred 
times. 

Anne Condon: I remember clearly the first time I heard Ned speak, at an early DNA 
computing workshop. His contributions to DNA nanotechnology were inspiring, and 
the technical points of his talk were illuminated with colorful pictures of artifacts 
and puzzles that he had encountered on his travels. I was mesmerized! His personal 
story was also inspiring; I’m sure I wasn’t the only one who appreciated his down to 
earth manner and his candor in recounting the rocky and unpredictable turns of his 
early career. A few years later, when I hosted Ned’s visit to the University of British 
Columbia, Rob Scharein, a mathematical topology and knot visualization expert, 
attended the talk. I came to appreciate Ned’s deep knowledge not only of chemistry 
but also of the mathematical aspects of topology. Ned stands out for me as someone 
who cared deeply about community, and he shaped ours for the better. 

Lila Kari: Ned’s contributions to DNA nanotechnology have been so profound 
and transformative, and his influence on multiple fields of life sciences and compu- 
tational sciences so far-reaching, that I think Hollywood will eventually make a 
movie about him. I thought about that and imagined what such a movie would be 
like. I thought about Ned, about our many professional and personal interactions, 
including during his visit to London, Canada, for the “Unconventional Computation 
and Natural Computation” conference in 2014. (His invited talk was “DNA: Not 
merely the secret of life’—a wording that only Ned’s mind, a mix of far-reaching 
scientific ideas, artistic bend, and sparkling wit, could conjure.) I thought then, what 
could the title be of this movie about Ned? How would I characterize Ned, in just a 
few words? Yes, he was a brilliant scientist. And yes, he was brash, direct, and used 
a lot of swear words. But, at the same time, there was always a palpable and unmis- 
takable gentleness, fairness, and humility behind this rugged appearance. And then 
I saw it, in my mind’s eye: The Hollywood movie about Ned’s life story would be 
made, a blockbuster, and it would be called “A Scientist and a Gentleman”. Because 
this is what Ned was. A scientist without equal, and a gentleman beyond words. We 
will miss you, Ned. 

Tim Liedl: Needless to say how difficult it is to fit into a few words the immensity 
of impact Ned had on so many researchers and so many lives. On the one side he 
created visionary and fundamental science, of his own and of many dear colleagues 
who are part of his direct and indirect academic family. And at the same time he has 
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been an attentive and caring friend to us. I feel that his dedication to scientific reason 
and his eminent humanism might not be the only two factors making DNA nanotech 
such a great field to contribute to, but they are certainly the most important ones. 

Paul Rothemund: In the late nineties when I was a graduate student, Ned graciously 
offered to let me stay at his apartment while I was visiting his lab. I remember a wall 
of books and finding Ned’s favorite, Slaugherhouse Five by Vonnegut, from which 
he often quoted “so it goes”. As a bed he offered up his couch, clearly unused for 
several years, as it was coated inches deep in the leaves of a nearby, dead, decorative 
ficus. As I curled up in the leaves I remember feeling excited to be on such a great 
adventure, getting to spend time with Ned in New York. 

Ned seemed to get to roughly steady state, as a productive PI, for at least a couple 
decades. As a young scientist, trying to find a set of mechanics by which I could 
function within academics, I always found Ned’s methods at least admirable, even 
when for some reason I found it hard to implement them myself. These mechanics 
include a lot of things: where and how to find and train students, where and how to 
apply for funding, with whom and how to collaborate, etc. I'll mention a couple that 
I remember as remarkable. 

For the “where to find students”, the answer was Beijing University. Many 
academics cultivate such a supply of talent; I have a Texan chemist friend who 
depends on Thai universities and know a German-Californian computer scientist 
whose large lab is predominantly South Korean. Some future historian will figure 
out what Ned’s lab has meant for the diffusion of ideas back and forth between the 
USA and China, especially should DNA nanotechnology have large impact on health 
or electronics. I like to think that Ned’s lab has had a positive effect on US-China 
relations at some scale. 

For the “where to find funding”, Ned drew from a variety of sources, the National 
Institutes of Health (NIH), the National Science Foundation, and the Office of Naval 
Research (ONR). NSF support for DNA nanotech is well documented, and Ned’s 
earliest NSF grant dates to 1997, but Ned was the only person I have known who 
was able to keep his DNA nanotech program alive with NIH funding on a sustained 
basis (started in 1982), primarily through grants for studying the intermediates of 
biological recombination. Despite the evidently thrice yearly “box from hell” of 
grants to review, which he received from whichever panel he served on, Ned seemed 
to benefit greatly from the NIH. Early NIH grants are cited in his 1987 paper “The 
design of a biochip: a self-assembling molecular-scale memory device”. Similarly 
Ned seems to be the one who made the original connection between the field of 
DNA nanotech and the ONR in 1989. Generations of scientists both within his lab 
and without (including my own lab) benefited from the enlightened largesse of the 
ONR, whose money was about as “no-strings-attached” as money can be. If DNA 
nanotechnology is ever successful at an industrial scale, ONR’s early support of 
Ned’s work should be lauded. 

For many years Ned relied on Ruojie Sha for training, lab culture, and manage- 
ment. Their symbiosis made Ned’s lab and its extraordinary output possible. From 
a funding point of view, having someone besides the PI to help maintain lab coher- 
ence is difficult (although NIH and biology seem more amenable to supporting such 
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a model). The churn of graduate students and postdocs makes many laboratories 
experience a sort of boom-and-bust cycle of productivity. The sustained success of 
laboratories like Seeman lab, which are able to pull off a PI-and-partner model, 
suggests to me that perhaps there should be more explicit support for those who 
would try it. 

With respect to training graduate students, Ned had a formula for success. It began 
with a boot camp of sorts (in my memory this may have been anything from two 
weeks to two months), in which a student new to the lab had to learn all the techniques 
the lab was practicing, and reproduce its classical results. Early on I think this training 
involved techniques such as native polyacrylamide gels “formation gels” of multi- 
stranded complexes like double crossovers. Later I know it included experiments 
like atomic force microscopy of double-crossover lattices. This rite of passage was 
further insurance against boom-and-bust cycles in the lab and ensured continuity of 
lab protocols—had we at Caltech ever been able to implement such program it would 
have saved years of reinvention as new generations entered the lab. 

After the training period, Ned would have a student work on an initial project 
that was carefully titrated to be incremental-yet-significant—a project that would 
test the mettle of the student, but would be guaranteed to yield a publication and 
safeguard the progress of the student toward a Ph.D. Success at this project would 
land the student a second, high-risk project—the sort of project that could open a 
new chapter for the lab and launch an independent career. Mediocre performance 
on the initial project would result in more incremental projects, intended to get the 
student safely out into the world. Ned articulated this strategy to me more than once 
as I struggled to manage students. My own interests and way of attempting to do 
science meant I couldn’t consistently implement Ned’s strategy; to a certain extent I 
consider Ned’s model a more general model for management and even childraising. 
Ned’s strategy provides a scaffolding, by which a neophyte can gradually find their 
way to independence, and I am still trying to consciously apply Ned’s model, or 
adaptations of it to different aspects of my life. 

Ned was also vigilant and diligent with respect to the fortunes of young academics 
outside of his lab. His DNA nanotechnology track at the FNANO conference in 
Snowbird Utah was notoriously difficult to get into. Submissions for talks by senior 
researchers were routinely shot down. However, for any postdoc going out for 
faculty job talks or assistant professor giving a tenure tour, Ned would make special 
dispensation to get them a talk in his session. 

Personally, I still haven’t found a set of mechanics by which I can sensibly function 
within academics, or more generally, navigate the daily vicissitudes of life. However, 
I find that independent of any details, I find Ned’s ethos and approach to science and 
life has deeply affected me. His being, and now having been, helps prop up a set 
values that I would like to keep, and I feel very grateful to have known him. 

Greg Petsko: I will always remember Ned’s case because it included what is 
perhaps the most endearing description I’ve ever seen in a support letter. In his 
statement, a good friend of Ned’s wrote: “Ned dresses and looks like the mountain 
man Grizzly Adams—or, more precisely, like Grizzly Adams on a bad hair day. His 
beard gives the impression that a small mammal is eating his face. His language can 
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be, on occasion, shall we say, colorful (only my drill sergeant could use profanity 
more imaginatively). He is as honest as the day is long, wears his heart on his sleeve, 
and cares about what should be cared about. You may ask what any of that has to 
do with his science; I would answer that it is inseparable from his science. Someone 
more conventional in their appearance or more concerned about how they appeared 
to others would probably have followed a trend rather than created one. Someone 
less well-centered in terms of character might have been tempted to cut corners to 
achieve rapid success or to over hype their work. Ned would never do either”. 

Niles Pierce: Ned was an inspirational scientist, gruff and humorous with a twinkle 
in his eye, and irreplaceable and warmly remembered. 

Grzegorz Rozenberg: On November 16, 2021, I lost a dear friend and the scientific 
community suffered an irreplaceable loss: Ned Seeman passed away. I met Ned 
during the early days of DNA-based computation (this is the name that we used 
then) and, as both of us stated many times, there was an instant special chemistry 
between us. We certainly enjoyed being together as well as exchanging emails. He 
was a great scientist and a wonderful human being: curiosity driven, unpretentious, 
and genuinely natural. We never collaborated on research, but during the DNA11 
conference in London, Ontario, in 2005 we had a long discussion on a topic that 
I suggested for our collaboration, and Ned was really positive about it. We agreed 
that I would visit him on one of my commutes between Leiden and Boulder so that 
we could work on this topic, but, regretfully, this never happened because of our 
very busy lives. We collaborated intensely on creating the International Society for 
Nanoscale Science, Computation and Engineering (with Ned becoming the founding 
president, while I served as the founding vice president). He was very enthusiastic 
about this project, and I witnessed his impressive professionalism and efficiency. We 
had a lot in common. For example, he enjoyed good jokes, and I always had a good 
supply of them. Also, he was an art lover. It is well known that he loved Escher, but 
he also liked Hieronymus Bosch. Since I have studied the paintings of Bosch (for 
over 50 years by now), I was able to explain many aspects of them. I even advised 
him which room in the Prado museum in Madrid he should visit (and why) and, 
indeed, he went there. As a matter of fact, I gave him many images of paintings 
by Bosch, which he clearly appreciated. We also talked a lot about magic. I am a 
performing magician, and Ned was very enthusiastic about my card magic. I told 
him about an important similarity between top magic and top science: both aim to 
approach the impossible as close as possible. He liked this explanation a lot. In fact, 
in our correspondence he often referred to my artist name, Bolgani, for example by 
writing “All the best to you and Bolgani”. When we exchanged new year wishes for 
2020, he was hoping to travel (he expected to go to DNA 26 and to China) so we 
could meet again, and I could present to him my new magic show. Sadly, this did not 
happen. Let me end by saying that while Ned has vanished, his friendship will stay 
with me—this is what the magic of a true friendship is about. 

Lulu Qian: The first time that I met Ned was at the DNA 10 conference in Milan. I 
was a first-year graduate student who had just found the field of DNA nanotechnology 
and molecular programming. When I walked into the lecture hall, my attention was 
immediately drawn to a man sitting on the floor, immersed in a laptop held on his 
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knees and a backpack sprawled next to him. I would have mistaken him for an 
undergrad who couldn’t find a seat in a popular class, except that he had distinctive 
gray hair and a gray beard, and there were plenty of empty seats in the room. It was 
a memorable scene to me, and it became much more memorable later, when I found 
out that he was Ned Seeman, the father of the field. Since then, I have associated 
the image of a scientific giant with the image of a man who sits on the floor and 
disappears into his own world of thinking in a busy crowd. 

Damien Woods: The two memories that come to mind are unpublishable. First 
one. Ned: “The way to kill a good project is give it to someone who doesn’t move 
it forward”. Second one. Young postdoc who (perhaps nervously, or at least quietly) 
approaches Ned with a criticism couched as a question: “I really like your paper in 
[major journal] but I have a question about how you analysed the data”. Ned, gruffly: 
“That paper is a piece of shit”. I’m paraphrasing the first. Not so much the second. 

Andy Ellington: This is almost certainly not appropriate, but: one of the things I 
most loved about Ned was how profane he was. He was brilliant, he was erudite, he 
was artistic, and he could curse a blue streak. This was a man who had no fucks left 
to give about anything other than what was true and good and real and who knew 
exactly who he was. I miss him greatly. 

Phil Lukeman: When Sir Christopher Wren, the great architect, physicist, and 
polymath, died, his epitaph was displayed in his magnum opus, St. Paul’s Cathedral. 
The epitaph reads, “Si monumentum requiris circumspice” translated roughly as “If 
you would seek my monument, look around you”. In my opinion Ned’s legacy is 
comparable to Wren’s—a legacy of de novo design principles for, and the construc- 
tion of, startlingly elegant nanoscale objects. The objects themselves can’t be seen 
with the naked eye of course; but the impact of his work in the field he founded—in 
the scientific literature and in the hundreds of scientists he trained and inspired— 
absolutely can. This volume is a small sampling of that impact. If you would seek 
his monument, look around you. 


Preface 


The idea of this book arose in response to a short paper by Ned Seeman, “DNA 
Nanotechnology at 40”, published in Nano Letters in February, 2020. The reference 
point is, of course, Ned’s 1982 paper on “Nucleic acid junctions and lattices’, or 
perhaps the moment two years earlier in an Albany bar when he first made the 
connection between DNA 6-arm junctions and Escher’s woodcut Depth. Either way, 
skipping forward roughly 40 years, Ned concluded his assessment with a humble but 
optimistic observation: 


So where is the field going? The short answer is “I DON’T KNOW”. [ . . . ] Every day I 
open a journal and I’m surprised by another unit of progress in the nanoscale control of the 
structure of matter that DNA nanotechnology offers. I like being surprised that way. 


We—and we are bold enough to speak both for ourselves as individuals and for the 
community as a whole—don’t know where the field is going either. However, we 
took Ned’s article as a sign that the time was right for the community as a whole 
to take stock of where we are and where we’re going. Coincidentally, Ned’s 75th 
birthday was coming up in December 2020, and we thought that starting this project 
would be a suitable way of honoring him and his contributions to science. Thus this 
project was born. 

Our goal was to have a mixture of technical and non-technical material, reviews, 
tutorials, perspectives, and open questions—content that may not have a natural home 
in a journal but that may inspire current researchers to sit back and think about the 
big picture and that may entice new researchers to enter the field. What are the untold 
stories, the unspoken concerns, the underlying fundamental issues, the unappreciated 
opportunities, the unifying grand challenges? What will help us see more clearly, see 
more creatively, or see farther? What are the things going on right now that may pave 
the way for the future? 

The contributions we received have been grouped into five parts. 


Perspectives. The discussion is kicked off by Simon Vecchioni, Roujie Sha, 
and Yoel Ohayon, three of Ned’s collaborators at New York University. Having 
between them over four decades of experience in Ned’s laboratory, they illustrate 
concretely how Ned’s concept of “semantomorphic” science—how the information 
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in nucleic acid sequences gives rise to semantic meaning that directs formation 
of morphology—has guided and will continue to guide DNA nanotechnology in 
the future. This is followed by an essay on non-equilibrium molecular systems by 
Fritz Simmel, where he assesses both the exciting potential of molecular machines 
with dynamic behaviors—all the way to robots—and the challenges that have made 
progress in this direction so much slower than in structural DNA nanotechnology. An 
important subclass of dynamic behaviors is circuit computations, and developments 
in this area are reviewed by Fei Wang, Qian Li, and Chunhai Fan. They discuss 
issues such as scalable architectures, massive parallelism, randomness and search, 
spatial organization, and repeated or continuous operation. Satoshi Murata gives an 
overview of research in Japan, which started in the mid-1990s as DNA computing, 
grew steadily into molecular programming, expanded to molecular robotics, and now 
pioneers a vision of molecular cybernetics. In perhaps the most personal essay in 
this volume, Thom LaBean reminisces about his path from the science of random 
sequence proteins and protein engineering to the world of DNA nanotechnology and 
characters and concepts he met along the way. 

Chemistry and Physics. DNA nanotechnology studies the actions of informa- 
tion in the material world, so understanding and exploiting the properties of the 
material world is at the heart of everything we do. Greg Tikhomirov asks whether 
this foundation is restricted to the chemistry and physics of nucleic acids per se, 
or whether a more general foundation can be found by generalizing to novel engi- 
neered information-bearing polymers—and how that might open the door to new 
applications. Rikke Hansen and Kurt Gothelf consider a very specific application: 
self-assembly of complex molecular electronic circuits. Here the strategy is to use 
information in DNA to direct the self-organization of homopolymers and oligomers 
that have useful electronic properties as single molecules. David Walker, Eric Szmuc, 
and Andy Ellington also explore using DNA to organize electronic circuits, but with 
different types of conductors: metallized DNA, carbon nanotubes, and more! In 
perhaps the most ambitious proposal in this volume, Bernie Yurke gives a theoretical 
outline for how DNA-organized layouts of optically responsive dyes could serve as 
the basis for quantum computing circuits. 

Structures. As reviewed by Fei Zhang, nucleic acid nanotechnology is rapidly 
approaching the point where complex finite structures as large as a bacterial cell 
can be engineered and where periodic designer crystals can be grown large enough 
and robustly enough to serve the variety of applications envisioned by Ned four 
decades ago. Beyond applications, the structures that assemble in DNA nanotech- 
nology have fascinating resonances in mathematics; Ellis-Monaghan and Jonoska 
emphasize connections to topology, graph theory, and algebra. Resonances with arts 
and crafts can also be quite striking—sometimes not only beautiful, but also scientifi- 
cally insightful, as is elegantly articulated by Cody Geary for the case of paper-folding 
origami and cotranscriptional folding of RNA. A paper by Pierre Marcus, Nicolas 
Schabanel, and Shinnosuke Seki go further, envisioning a theoretical model wherein 
the kinetics of cotranscriptional RNA folding amounts to a programmable computa- 
tion. A third theoretical contribution, from Matt Patitz, reviews the impressive range 
of models and questions that have been explored to understand the programmable 
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self-assembly of molecular components called tiles. In probably the most abstract 
chapter of this volume, Jack and Robyn Lutz muse about a form of reasoning— 
the counterfactual “as if’—that has had proven to be surprisingly powerful for 
understanding deterministic processes both in tile self-assembly and in distributed 
computing. 

Biochemical Circuits. Beyond using information in DNA to direct the self- 
assembly of molecules into structures, DNA nanotechnology allows the principled 
construction of systems of molecular machines that dynamically interact with each 
other—otherwise known as information-processing circuits. While over the past 
decades, DNA strand displacement circuit implementations have gradually grown 
from a few to a few hundred computing gates, Yuan-Jyue Chen and Georg Seelig 
present a vision—with preliminary experimental results!—for scaling up circuit size 
by several orders of magnitude using array-based synthesis and high-throughput 
sequencing. Building on this approach, Luca Cardelli outlines a theoretical appli- 
cation circuit: recording the relative timing of a large number of distinct molecular 
events. In a software-oriented article, Matt Lakin, Carlo Spaccasassi, and Andrew 
Phillips give an overview of the past ten years of development of the Visual DSD 
system for specifying, analyzing, and compiling DNA strand displacement programs 
(and beyond). Their experience and vision is full of insights for the future of molecular 
programming. 

Spatial Systems. Biochemical circuits have been studied predominantly in the 
well-mixed limit, where signals are carried by spatially uniform concentrations of 
distinct molecular species. Guillaume Gines, Anthony Genot, and Yannick Rondelez 
review experimentally demonstrated biochemical circuits that are spatially non- 
uniform—whether in reaction—diffusion systems or in droplets or other forms of 
compartmentalization—and discuss the potential for parallelism in computation and 
evolution. While Gines et al. touch on the connection between ecological systems 
and chemical systems—discussing a reaction—diffusion system that is an analog of a 
predator-prey ecosystem—NMing Yang and John Reif take this metaphor much further 
in a theoretical exploration of surface-bound single-molecule DNA nanorobots 
whose interactions are programmed to have collective behaviors similar to social 
insects. While Yang and Reif employ a stochastic cellular automaton model wherein 
each cell represents a single molecule, Masami Hagiya and Taiga Hongu take a larger- 
scale view wherein each cell represents a region of space within a programmable 
gel, and they propose algorithms for maze solving and for learning Boolean circuits. 
Returning back from theory-land to the realm of the real, the final chapter of this 
volume—by Beatrice Ramm, Alena Khmelinskaia, Henri Franquelim, and Petra 
Schwille—showcases the incredible and beautiful spatiotemporal patterning that 
develops on reconstituted lipid membranes in the presence of the E. coli MinDE 
system, delicately modulated by DNA origami nanostructures. 


Of course, the logistics of putting a volume together, the disruptions of the 
pandemic, and the fact that so many of us are overcommitted means that many 
voices and topics that could have been added in this volume are, alas, missing. 


XXVi Preface 


Nevertheless, as incomplete as they might be, the diversity and depth of the contri- 
butions in this volume attest to the rich accomplishments and continued potential of 
Ned Seeman’s vision for DNA nanotechnology. We can see that the past 40 years 
brought us from the concept of programmable DNA crystal hosts that organize guest 
molecules, to the practice of 3D DNA crystals, complex micron-scale structures, 
reconfigurable nanomechanical devices, programmable computing circuitry, primi- 
tive molecular robotic systems, and spatially organized materials—accompanied by 
increasingly sophisticated software and mathematics. And the next 40 years? We 
don’t know! From the discussions in this volume, we might expect that integrating 
the many aspects of DNA nanotechnology will result in a dynamic, non-equilibrium, 
autonomous, cell-scale, molecular robot that can be programmed to sense, compute, 
and act. Or... more likely, we will be surprised! 


Pasadena, CA, USA Erik Winfree 
Tampa, FL, USA Nataša Jonoska 
November 2022 
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Perspectives 


Beyond Watson-Crick: The Next 40 R) 
Years of Semantomorphic Science geci 


Simon Vecchioni, Ruojie Sha, and Yoel P. Ohayon 


Abstract Itshould come as no surprise that the world of DNA nanotechnology is still 
learning how to fully master the different steps of the self-assembly process. Semanto- 
morphic science, as the late Ned Seeman would describe DNA nanotechnology, relies 
on the programmability of nucleic acids (semanto-) to encourage short oligomers to 
put themselves together (-morphic) into designed architectures (science ?). In the 
same way that Gibson assembly frustrates the molecular biologist, semantomorphic 
self-assembly has for decades, and continues to, defy the scientist in question. In 
a brief analogy, Gibson assembly can be thought of as enzymatically directed self- 
assembly [1] that follows the same general rules as Seeman assembly: (1) guess 
conditions; (2) set up reaction; (3) pray to entity of choice; (4) check result; and 
(5) repeat as needed. In other words, when it works, it works well; when it doesn’t, 
troubleshooting the sticky-ended cohesion between too-large or too-small building 
blocks with imperfect assays can take months. Returning to semantomorphic science, 
it is still mesmerizing that any of this works at all, and for that, we owe our deepest 
gratitude to Ned and his generations-spanning vision. 


1 A Brief Retrospective 


If we look back at the last forty years of work done in Ned’s Lab, we can roughly 
break this period into decades, anchored to the founding ideology. In 1982, Ned 
proposed the idea of using self-assembling DNA oligomers as crystalline scaffolds 
for the structure determination of biomolecules [2]. By 1991, with Junghuei Chen’s 
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development of the first complex topological object, a DNA cube, it was apparent 
that Ned’s vision could be possible [3]. In 1998, Ned worked with Erik Winfree to 
establish that periodic 2D DNA crystals were attainable through self-assembly [4]. 
This study and the ensuing cluster of papers on 2D lattices [5—9] may be described 
as a time when it was clear that Ned’s vision should be possible. The following 
decade brought nanomechanical devices (Chengde Mao’s B-Z device [10], Bernard 
Yurke’s nano-tweezers [11], and Hao Yan’s PX-JX> actuator, [12] among others), 
DNA origami [13] and LEGO-ology [14], and a vast proliferation in the complexity of 
self-assembled shapes and devices. In 2004, Chengde Mao published the tensegrity 
triangle motif [15], which Jianping Zheng modified in 2009 to self-assemble into 
the first 3D macroscopic DNA crystal [16], demonstrating that rigid, simple, and 
programmable self-assembly could be carried out in 3D: Ned’s vision would be 
possible after all. 

The start of the 2010s brought about an expansion of complexity: devices involving 
crystalline logic gates [17], self-replicating DNA machines with Paul Chaikin [18], 
and natural DNA computing with Natasha Jonoska [19, 20]; and for the first time, 
these devices were no longer able to be exchanged between laboratories in the same 
way—the outputs and techniques had now become so complex that composable 
parts were no longer as simple or modular. Rather than a single motif or reaction, 
DNA nanodevices had begun to involve a vast number of moving parts, atypical 
constituents, and specialized conditions. Describing Hongzhou Gu’s nanoparticle 
assembly line [21] for grant applications remains difficult to this day, let alone the 
modifications necessary to change its function and the critical role that this device 
plays as a use case for DNA nanotechnology. The current state of semantomorphic 
science in the Seeman Lab, with the anticipated publication of our first biomolec- 
ular structures attained through 3D DNA crystals (manuscript in preparation at the 
time of writing), is that Ned’s vision is possible. Not only is it possible, but the 
feedback gained through modification of simple parts and predetermined solutions 
to the diffraction phase problem can allow rapid screening of a target library in a 
way that is impractical using state-of-the-art crystallography. While the advent of 
cryogenic electron microscopy has already shaken traditional crystallography with 
promises of imminent obsolescence, there are use cases for Ned’s 3D diffraction 
lenses that we believe are critical for the next 40 years—particularly in the realm of 
non-canonical, non-Watson-Crick architectures, which are chronically overlooked 
by dogmatic crystallographic science. Watson-Crick interactions can be, in broad 
strokes, defined as double helical interactions involving G:C and A:T base pairs. 
From this point on, we believe that eclipsing Watson-Crick semantics will allow for 
technologically relevant morphologies, i.e., semantomorphic science as Ned once 
envisioned it. 
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2 A Science Allegory 


Ned’s last publication in 2021 was Brandon Lu’s hexagon [22], a structure that 
can in many ways be thought of as a representation of DNA nanotechnology more 
broadly. In the outward-facing story of the hexagon, it was discovered that use of 
non-Watson-Crick sticky-ends in a tensegrity triangle caused non-canonical cohesive 
interactions involving A:C and G:T. Rather than inhibiting self-assembly, this non- 
Watson-Crick “mismatch” drives the growth of much larger and, apparently, more 
thermodynamically-favored crystals with bent Holliday junctions, or branch points. 
The crystals possess a pseudo-infinite hexagonal channel along the crystallographic 
P63 screw axis, presenting a technological opportunity for nanoscopic organization 
of composable parts in a metachiral system (see Sect. 4 for elaboration). 

Preceding this publication came years of head-scratching and some tunnel vision 
by way of optimism. Ned once said that the use of a microscope—any kind of 
microscope—was a heuristic trap: the observer would, more often than not, see 
whatever they were looking for. In the world of self-assembly, this led to almost fifteen 
years of curating rhombohedral—and only rhombohedral—crystals. Lurking in the 
corners of thesis figures and oral histories since the first days of Ned’s 3D assemblies 
are needle-like, chunky crystals that appeared at first glance to be failure products. 
As the designs in our laboratory became more complex, the chemistry further from 
Watson-Crick, and the geometries more distant from B-form double helices, we began 
seeing crystals with jagged, improper shapes and space group “failures” in the X-ray 
diffraction patterns (see Fig. 1). Hao Yan published a similar study in collaboration 
in 2016, in which a “tensegrity square” became an unwound, flower-like design with 
P32 symmetry [23]. In the summer of 2020, the conundrum of the hexagon began a 
rapid convergence. Updated processing software and the simultaneous result of three 
separate projects with P63 symmetry made it clear that our “accidents” and “failures” 
were telling us something. In abject disregard for Ned’s adage “garbage in, garbage 
out,” Brandon Lu and Karol Woloszyn—two extremely talented researchers in the 
laboratory—on the same day obtained molecular replacement results that showed 
tensegrity triangles arranged in a chair-hexagon conformation. 

It was at that moment starkly clear that self-assembly had produced something 
different: packing effects had imposed curvature on the double helix to form a 
corkscrew-like architecture from a linear monomer. Nature finds a way, especially 
when trying to impose Watson-Crick symmetry on strained oligomers engaging in the 
dubious process of self-assembly. But it is precisely this departure from Watson-Crick 
that uncovered a new mode of self-assembly—a change to DNA primary sequence 
had profound implications on the quaternary structure of those oligomers. To this 
end, semantomorphic science may find its next relevant morphologies by altering its 
core semantics and by attending to the unexpected results. 
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Rhombohedron 


Spherulite 


Fig. 1 Self-assembly of tensegrity triangles [24]. a-c Various crystallization conditions generate 
thombohedra (orange), hexagons (green), and spherulites (blue) in the same drop, despite general 
optimism for rhombohedral assembly. d Two identical hanging drops on the same glass slide, with 
hexagonal crystals (left) and rhombohedral crystals (right) attained simultaneously. It is clear that 
macroscopic effects can result from minute differences in the crystal microenvironment 


3 A Roadmap 


We envision the departure from Watson-Crick architectures through (1) modification 
of DNA semantics by way of new base pairing rules; (2) the augmentation of DNA 
syntax leading to diverse secondary and tertiary structures; and (3) the expansion of 
the DNA operating system through modifications to the sugar-phosphate backbones. 
We elaborate upon these categories below. 


3.1 DNA Semantics: Schrödinger Crystals Versus Seeman 
Crystals 


In meetings and conferences, Ned would often repeat that “symmetry is the opposite 
of control!” As itis well known, the very first step toward designing and programming 
synthetic branched junctions involved the development of SEQUIN, a FORTRAN 
algorithm explicitly implemented for symmetry minimization [25]. In a computer 
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science sense, this can be thought as the greatest common substring optimization 
algorithm, and in a biochemical sense the free energy minimization of desired 
secondary structures. Schrödinger posited in his 1951 treatise, “Was ist Leben,” 
that the fundamental nature of life required an aperiodic crystal as the carrier of 
genetic information [26]. Only by minimizing repeating segments of information in a 
geometrically isomorphic crystal would living systems be able to propagate the stored 
information between generations. Schrédinger’s postulation preceded SEQUIN by 
four decades, but proposed a similar principle of symmetry minimization contained 
in topological units unperturbed by the information carried within. 

By contrast, Seeman crystals can be thought of as mostly periodic crystals, 
whose sequences repeat (tesselate) in the semantomorphic bundles that we call 
motifs. These motifs are subject to crystallographic symmetries to generate theo- 
retically infinite nanomaterials. Within this system, local symmetry minimization 
is critical in imposing long-range order: Schrédinger’s aperiodic crystal must be 
tightly constrained. The tensegrity triangle, Ned’s crowning joy, contained a peri- 
odic unit with only 42 nucleotides (nt). Geometric asymmetry, in the form of 3:3:1 
annealing ratios of three interwoven oligomers, counteracts the rotational symmetry 
of the triangle arms to allow for sequence symmetry down to 42 nt (Fig. 2a vs. 
Fig. 2c). A similar motif with asymmetric sequences across the triangle and no rota- 
tional symmetry between arms contains 126 nt in each periodic unit for the same 
unit cell, this time with symmetric strand ratios between seven unique oligomers 
(1:1:1:1:1:1:1) (see Fig. 2d). There is a clear exchange between geometric and 
sequence symmetries—penalties in sequence minimization require more complex 
geometry, which in turn makes experiments more difficult and the analysis more 
complex. It is clear that Schrédinger’s prescription holds for DNA nanotechnology 
and can be amended for Seeman crystals: the more aperiodic a crystal’s sequence, 
the more symmetric the geometry can be. With this in mind, expanding the genetic 
code will have a clear effect on the types of structures that nanotechnologists can 
engineer. 

There are many existing strategies for expanding “vanilla DNA”—as Ned would 
describe Watson-Crick chemistry. The most obvious lies in nucleobase analogs, 
which typically employ rearranged hydrogen bonding groups: inosine (I) [24], isocy- 
tosine (S), isoguanine (B) [27], 2,6-diamino-purine, 2,4 diamino-pyrimidine [28], 
and “hachimoji” nucleotides dZ and dP [29]. Indeed, very extensive applications of 
these nucleotides have been carried out by generations of Ned’s students, and it is 
clear that they satisfy the requirements for Seeman crystals [30]. The full imple- 
mentation of an expanded lexicon has, to our knowledge, not been carried out in a 
self-assembly context. 

A further development lies in metal-mediated DNA base pairing (nmDNA), 
which has emerged over the last two decades as a viable means to add symmetry 
(reducing the aperiodicity of DNA crystals) by allowing non-Watson-Crick pyrim- 
idine:pyrimidine stabilization through the coordination of diverse metal ions (most 
commonly Ag* and Hg”*) [31, 32]. The payoff here lies in reducing the regularity 
of the overall behavior—metal base pairs are known to be pseudo-covalent, stabilize 
the helix, and add novel electronic and magnetic properties [33-35], all of which 
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a R3 Space Group Cc P1 Space Group 
| Periodic unit 42nt | 
rotational 
symmetry 


Symmetric 
Strand Ratio: 


195439595429 


Asymmetric Strand Ratio: 


Fig. 2 Tensegrity triangle motifs. a-b The symmetric two-turn triangle developed by Jianping 
Zheng employs a repeating 42 nt unit that attains higher-order complexity by asymmetric strand 
ratios (strands S1-S3) and sequence weaving. c—-d The asymmetric version of this motif, developed 
by Rachel Chernet, only repeats over 126 nt, but involves easier annealing with symmetric and 
logical strand ratios (strands S1—S7) 


represent a general departure from hydrogen bonding as the sole means of imple- 
menting Schrödinger and Seeman crystals. By adding bioinorganic diversity to the 
properties and behaviors of nucleic acids, there lies an opportunity to increase the 
complexity of DNA structural motifs—more than compensating for the increase in 
information entropy. A main focus of the work in Ned’s group lies in these types of 
architectures, and we look forward to sharing Ned’s latest works with the community 
in the coming months and years. 

We envision that the expansion of the coding language of DNA will enable more 
precise, geometrically minimized, and behaviorally complex semantomorphism, and 
we have only begun to scratch the surface of these techniques. 


3.2 DNA Syntax—Information Bundles and Secondary 
Structures 


With the advent of the first functional poly-crossover (PX) motifs by Zhiyong Shen 
[36], Ned found a way to augment the complexity of DNA branches (Fig. 3b). Beyond 
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topologically-closed knots and catenates [37], PX motifs represent a wonderful 
conundrum of the double helix that has driven the creation of new dynamic devices. 
For years, Ned tried to convince biologists to utilize PX, to search for it in biological 
systems that Xing Wang demonstrated [38]. The utility of PX DNA was made clear 
with the PX-JX> device, first designed by Hao Yan in 2002, in which geometric 
control was attainable through the organization of DNA within differentially shaped 
information packets [12]. Through the addition of small changes attained through 
strand displacement, it was possible to reorganize the whole semantomorphic struc- 
ture—this represents a fundamental change in the grouping and behavior of infor- 
mation, and in the same way syntax dictates the order and function of words in a 
sentence (Fig. 3). To this end, expanding the syntax of DNA nanotechnology will 
enable more complex materials in the coming decades. A road to syntactic expan- 
sion follows Ned’s ideas, involving hairpins, triplexes, quadruplexes, and yet-to-be 
-discovered or developed secondary structures. 


Fig. 3 Archival chalk 
drawings from Ned’s office 
of various DNA secondary 
structures. a Duplex DNA; b 
PX DNA; c JX: structure; d 
PX DNA with hairpins 
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The hairpin was one of the earliest tools in the semantomorphic book of style— 
Hao Yan used this technique to give a2D crystal the power of nano-actuation: by inter- 
facing with strand displacement, hairpins can act as geometric parentheses, putting 
topological information on hold until it is needed [39]. Hao’s motif changed shape 
like an omni-directional accordion, performing nanoscale work stored in hairpin 
parentheticals. Hairpin technology has been extensively explored, but likely has new 
tricks and interfaces to tell us, especially when implemented with other syntactical 
elements. 

Triplex-forming oligomers (TFOs) exploit the major groove of pyrimidine- 
enriched DNA sequences to bind a third strand. The overall architecture of the 
DNA is likely altered to some degree, as Seeman crystals do not form with TFOs 
during self-assembly—there is, however, no difficulty in decorating these crystals 
with TFOs after the fact. In this way, the effect of a TFO is not unlike an exclamation 
mark—placed properly it can manifest a new behavior in an existing sentence. This 
technology, developed in collaboration with David Rusling, has been interfaced with 
various attachment chemistries [40] and shows strong promise for adding reporter 
and sensor behavior to semantomorphic structures. A complete structural description 
of TFO-bound DNA nanostructures has not been attained, but it is clear that these 
motifs will enhance and emphasize DNA technologies in the coming years. 

It is known that repeats of guanine bases can interrupt the double helix by forming 
a weaving tetraplex structure (G4), and Ruojie Sha and Nongjian Tao built on this 
idea to show that G4 molecules are excellent electronic conductors [41, 42]. Unlike 
Watson-Crick DNA, which could be generously described as a weak resistor (or a 
fantastic molecular heater), G4 motifs are now known to be wide-bandgap semi- 
conductors. In essence, these structures bind alkali cations between poly-guanine 
stacking planes and impart electronic functionality to an otherwise inert DNA. This 
structural organization changes both the orientation and the underlying electron delo- 
calization of the material—in this way, G4 acts as a semicolon. To be useful in a 
semantomorphic sense, it must come in the middle of a DNA sentence, but it changes 
the focus and structure of the surrounding information. One does not use a semicolon 
lightly; and conversely, G4 must be used sparingly for its symmetry penalties and 
strong departure from Watson-Crick helical parameters. The implementation of G4 
within structural DNA motifs has been sparing, but as designs become more complex, 
we can expect G4 to become increasingly common in the future. 

There exist other secondary structures that rely on semantomorphism [43], such 
as aptamers [44], i-motifs [45], and of course, paranemic cohesion (PX motifs). 
The future of DNA-based technologies will strongly rely on sequence context 
and behavior, and the ability to code for novel heterostructures is a powerful and 
underexplored aspect of semantomorphic science. 
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3.3 Nucleic Acid Operating Systems: XNA and Beyond 


If polymerases can be considered the read and write heads of the DNA hard drive, 
the metaphor can be extended to consider the sugar-phosphate backbone as the oper- 
ating system in which nucleobase information is stored. Entire classes of enzymes are 
employed to extract DNA information into the RNA operating system and further- 
more translate that information into peptides. Alterations to the nucleic acid operating 
system generate what are known as xenobiotic nucleic acids (XNAs) [46], and they 
offer what any orthogonal digital platform can: data encryption, novel behaviors and 
features, and the need to rewrite basic programs. 

Encryption is a result of enzymatic orthogonality—with a different form of 
aperiodic crystal, information can neither be read nor copied and therefore cannot 
escape from a synthetic biological system encoded using XNA [47]. This concept 
has powerful implications for synthetic polymers and synthetic life. The novel 
features result from the chemical modifications present in XNA via the addition 
of hydrophobic groups, linkers, and various anchors and substituted sugar moieties 
[46], leading to different interactions with the surrounding solvent and ultimately 
presenting the opportunity to program interfaces with chemistry beyond neutral 
(physiologically adjacent) aqueous systems. Finally, a departure from Watson-Crick 
geometry changes the shape of the ensuing molecule, including the helical period, the 
radial dimensions, and the overall pitch of the nucleobases. In order to be successful 
in achieving nanotechnological programming in an XNA system, the connection 
between semantics and morphology must be painstakingly re-established for each 
polymer. The benefits of this expansion have been clear for some time, and work 
toward structural DNA(+) nanotechnology has been carried out robustly for at least 
a decade. 

Cody Geary and colleagues demonstrated this concept by folding RNA nanos- 
tructures while they were transcribed from a DNA template [48]. This represented a 
fundamental shift toward “living nanotechnology” or “in vivo nucleic acid nanotech- 
nology” by showing that the RNA operating system could be (occasionally) relied 
upon to generate semantomorphic structures. Ned implemented true XNA backbones 
within double-crossover motifs, measuring the double helical periodicity of peptide 
nucleic acids (PNA) [49] and 2’-fluoro-deoxyribonucleic acid (F-DNA) [50]. PNA 
has subsequently been shown to operate well in organic solvents [51], which has 
long considered a barrier to self-assembled architectures predicated upon hydrogen 
bonding. 

Ned and Jim Canary at NYU have been collaborating since 1997 to attach organic 
polymers to the sugars of 2’-O-propargyl-modified ribonucleotides. This approach 
employs a sterically-unhindered linking site to “plug in” a variety of molecules to 
the double helix. This was most elegantly demonstrated by Xiao Wang through 
polyanaline/emeraldine functionalization of a DNA cage that imparted optoelec- 
tronic functionality into Seeman crystals [52]. This has been further extended into 
nylon-DNA [53] and tertiary structure-like crosslinking [54]. 
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Further orthogonal backbones of interest might overcome charge polarity in the 
phosphates, impart an integer number of nucleotides per helical turn (to avoid the 
“curse” of 10.5 bp/turn), increase thermostability, frustrate endonucleases, provide 
a sequence-agnostic attachment strategy for nanomaterials, or even impart electrical 
or magnetic functionality to the polymer of life [46, 55]. It is clear that backbone 
modifications are an underexplored, technically difficult, but rewarding addition to 
nucleic acid nanotechnology. With any operating system change, the new environ- 
ment must be first explored, structurally characterized, and tested. As with PNA 
and RNA nanotechnologies, future XNAs can be expected to mature and allow a 
fully-elucidated and adaptable semantomorphic coding environment. 


4 Beyond Watson-Crick: A Call to Action 


The benefits of escaping Watson-Crick geometry are hopefully made clear: semanto- 
morphic science stands to gain (1) a much greater geometric diversity through a more 
versatile genetic code; (2) expanded diversity through syntactic groupings of bases 
employed in new secondary structures (see Fig. 4); (3) orthogonality to enzymatic 
and aqueous systems through altered backbone operating systems; and (4) an inte- 
gration of these various approaches to impart novel mechanical, material, electronic, 
or catalytic behaviors into semantomorphic polymers. As with DNA nanotechnology 
in its early days, the changes required to carry out this paradigm shift will require 
an immense amount of experimental exploration and community action; but, unlike 
the first forty years, the next forty years can rely on the incredible foundation laid 
out by the pioneering work of Ned Seeman. It is with the deepest gratitude that we 
remember Ned and his mission and the painstaking groundwork that he and his early 
team performed to create the field from scratch. With the guidance left in his writings 
and with his students, colleagues and collaborators across the globe, there exists a 
full and flourishing roadmap forward. In his work, “DNA Nanotechnology at 40,” 
Ned concluded “Every day I open a journal and I’m surprised by another unit of 
progress ... I like being surprised that way.” [56] Let us continue to surprise him. 
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Fig. 4 Self-assembly beyond Watson-Crick: cross-section of 3D hexagonal DNA assembly, PDB 
7R96 [22]. The periodic unit forms a tensegrity triangle (orange center, see Fig. 2) which packs 
into a star-like lattice. The space group P63 indicates a sixfold symmetry in each channel (see the 
hexagon-like nature of the triangle arrangement) and that three layers of the material are required in 
the z-direction for the meta-helix to make a full turn around the cavity (the crystallographic “screw 
axis”). This discovery was a joyful surprise to Ned 
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Abstract Dynamic DNA nanotechnology aims at the realization of molecular 
machines, devices, and dynamic chemical systems using DNA molecules. DNA is 
used to assemble the components of these systems, define the interactions between 
the components, and in many cases also as a chemical fuel that drives them using 
hybridization energy. Except for biosensing, applications of dynamic DNA devices 
have so far been limited to proof-of-concept demonstrations, partly because the 
systems are operating rather slowly, and because it is difficult to operate them con- 
tinuously for extended periods of time. It is argued that one of the major challenges 
for the future development of dynamic DNA systems is the identification of driving 
mechanisms that will allow faster and continuous operation far from chemical equi- 
librium. Such mechanisms will be required to realize active molecular machinery 
that can perform useful tasks in nanotechnology and molecular robotics. 


1 DNA Nanotechnology: A Personal Account 


I stumbled into DNA nanotechnology at the turn of the millenium and since then 
actively participated in about half of its 40 year history which is celebrated here. 
With a background in solid-state nanophysics—having spent time in clean rooms 
for the fabrication of semiconductor nanodevices—I was initially fascinated by the 
promise of “self-assembled electronic devices” that would become possible with the 
help of DNA’s magic programmable base-pairing rules in the future. In fact, I was 
completely unaware of DNA nanotechnology until I saw a talk by Uri Sivan from 
Technion in 1998 who, together with Erez Braun and co-workers, had just realized 
the first DNA-templated metal nanowires [1]. I actually first considered doing a 
postdoc with Uri Sivan, but during a visit at Bell Laboratories in 1999, Bernard 
Yurke introduced me into his work on“DNA tweezers” that he had started with Allen 
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P. Mills and Andrew Turberfield (who did a Sabbatical at Bell Labs in this year).! 
The vision of making artificial molecular machines from DNA molecules seemed 
so unconventional that I felt immediately attracted to it. Fortunately, Bernie Yurke 
offered me a position, but due to a delay in funding I could only start at Bell Labs in 
early 2000. Bernie and his co-workers were so nice to let me do a series of control 
experiments for the DNA tweezers paper [2], which was already under review at that 
time, and made me a co-author on this extremely important paper. During our work 
on strand displacement-based DNA machines I finally became aware of Nadrian 
(Ned) Seeman’s pioneering work in DNA nanotechnology—in fact, together with 
Chengde Mao he had already published a different type of DNA nanodevice in 1999, 
which was based on the B-Z transition of CpG rich DNA sequences [3]. 

A major initiating event for me in this period was the seventh International Work- 
shop on DNA computing in Tampa, FL, where for the first time I was exposed 
to the deep connections between theoretical computer science and self-assembly 
processes—I still remember a wonderful tutorial introduction by Erik Winfree—and 
also various ideas on computing in biological systems—topics that have immensely 
widened my view on the subject and have fascinated me ever since. 


2 Designing and Programming with DNA 


DNA nanotechnology is an extremely interdisciplinary field of research. Compared 
to many other fields, it also has a relatively low entry barrier for researchers with 
diverse scientific backgrounds. There are various reasons for this, but in a deep sense 
really the unique properties of DNA molecules are responsible for it. 


2.1 DNA—A Programmable Molecule 


DNA’s biological role is intimately connected with its molecular and structural prop- 
erties, which are heavily utilized in DNA nanotechnology. Of course, there is base- 
pairing, adenine (A) pairs with thymine (T), guanine (G) with cytosine (C), but in 
such a manner that A-T and G-C base-pairs have the same size in the context of the 
double helix. This makes a DNA duplex a structurally very uniform heteropolymer, 
which is biologically important because this allows DNA and RNA polymerase to 
run smoothly over it regardless of its sequence. Of course, this also means that infor- 
mation can be encoded in the sequence of base-pairs. As a double-stranded helix, 
DNA is mechanically relatively rigid. Binding between complementary strands is 
highly specific and depends on the base sequence. The thermodynamic properties 
can be determined in an approximately additive manner by just summing up the con- 


' When I was looking for a postdoc position, I also considered to switch to a more biological topic, 
but several potential postdoc advisors intimidated me by saying “well, then you need to spend several 
years to learn more about biology to get even started”. It turns out that DNA nanotechnology has a 
relatively low entry barrier, which still is an attractive feature for young researchers. 
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tributions of nearest neighbors [4]—there are no long-range interactions between 
distinct sequences along a double helix that complicate matters. Put in the nanotech- 
nology context: Double-stranded DNA is a rigid, information-encoding polymer with 
uniform geometric properties. The interactions between DNA single strands can be 
programmed by choosing the appropriate sequences. 

This is almost everything you need to know to get started with DNA 
nanotechnology—which is quite astonishing, because really a lot is known about 
DNA—-you can ignore the details in the beginning, but they become important later. 
Because of its central role to life, DNA has attracted scientists from very different 
disciplines and next to biologists, of course, chemists who developed schemes to 
synthesize and chemically modify DNA at will and who also came up with synthetic 
DNA and RNA analogues. Physicists like to study the structure of DNA, its mechanics 
and dynamics as a polymer, its thermodynamics, and its charge interactions [5, 6]. A 
whole sub-discipline of computer science—bioinformatics—is devoted to the study 
and comparison of DNA sequences. DNA nanotechnology greatly benefits from all 
these achievements—automated synthesis, structural and thermodynamic informa- 
tion, and computational tools are all available. This makes DNA nanotechnology to 
be the most advanced molecular technology so far, and because it is sequence based, 
it is also programmable in a relatively straightforward manner. 


2.2 Learning by Building 


Researchers in synthetic biology often talk about the design-build-test-learn (DBTL) 
cycle and about understanding by building. In DNA nanotechnology, these princi- 
ples are in fact realized! The availability of computer tools combined with automated 
synthesis and a wide range of established characterization tools enable even inexperi- 
enced researchers to quickly create novel DNA nanostructures, study their behavior, 
and use them in applications. When mistakes are made, one can relatively easily go 
through another DBTL cycle to improve the results. Importantly, this makes it pos- 
sible to find a good balance between rational design (taking into account the wealth 
of nucleic acids knowledge) and a learning-by-doing approach. 

Among the many fascinating achievements of DNA nanotechnology [7], DNA 
origami is certainly the best known outside of the community [8, 9]. DNA utilizes 
a long single-stranded DNA molecule termed the “scaffold” and a large number of 
shorter staple strands that sequence specifically bind to designed positions along the 
scaffold. This creates links between these positions, which fold the scaffold into a 
three-dimensional molecular shape. The same scaffold strand can be “programmed” 
to adopt completely different shapes simply by the choice of the staple sequences. 
Using a DNA origami design program allows researchers to design such origami 
structures on the computer and choose staple sequences for synthesis that will later 
physically assemble the desired shapes in the test tube [10]. An interesting devel- 
opment sets in here: Designing DNA nanostructures using computer programs and 
the corresponding DNA nanotech methodology represents a new skill set that draws 
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from, but is distinct from that of traditional biophysics or biochemistry. As stated 
already before, you do not need to know every chemical detail about DNA to build 
such structures. At the same time, origami researchers develop a “feeling” for what 
can be built, and what not. 

Think of modern computer programmers—it is not required to have detailed 
knowledge about computer hardware (or electronics and solid-state physics, for that 
matter) to be able to write code in a high-level programming language and solve 
complex tasks with it. While it may be a little sad that programmers do not know 
how certain things work in detail—it is also a big achievement that they do not need to 
know! Modularization and abstraction help to make much bigger leaps much faster. 
And, more or less, this is what happens in DNA nanotechnology—this is the power 
behind the idea of molecular programming [11, 12]. 


2.3 Challenges and Limitations 


Already now the ability to create DNA-based nanostructures of almost arbitrary 
shape combined with the availability of a wide range of chemical modifications for 
these structures has found use in nanomedicine, biosensing, nanoscale science, and 
biophysics. In such applications, DNA nanostructures are typically used to spatially 
organize molecules and nanoparticles with nanometer precision. 

More or less obvious questions (and challenges) for the future of static DNA 
nanostructures regard the scale of the structures: Can we make bigger and bigger 
functional structures, what quantities and at what cost [13]? Can we achieve greater 
precision in the arrangement of molecules and increase the chemical diversity and 
functionality of DNA nanostructures? Using modified DNA and nucleic acid ana- 
logues, e.g., it would be great if one could rationally design DNA nanostructure-based 
catalysts with similar catalytic versatility and power as enzymes. These challenges 
are rather chemical in nature and will hopefully be addressed by the gifted chemists in 
the field (if we come back to the comparison with computer programming: chemists 
provide the hardware for DNA nanotechnology and thus define and extend the capa- 
bilities of the hardware-agnostic molecular programmers). 

In my opinion, a greater challenge lies in the realization of dynamic functions and 
the control and utilization of non-equilibrium processes. Over the past two decades, 
DNA nanotechnology has been concerned with the realization of molecular switches 
(such as the tweezers), machine-like devices (molecular walkers [14, 15], rotors [16], 
and the like), DNA-based chemical reaction networks (CRNs), and DNA comput- 
ers [17, 18]. There are different motivations for creating such systems: On a funda- 
mental level, one would simply like to learn how to build artificial molecular machines 
or generally realize dynamic molecular functions. Then, such systems should be of 
great use in nanotechnology—examples for applications of dynamic DNA processes 
already exist: enzyme-free DNA-based sensors (hybridization chain reaction [19], 
catalytic hairpin assembly [20]) and DNA-based super-resolution microscopy (DNA- 
PAINT) [21]. 
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In several aspects, however, dynamic DNA systems are still very limited. Next 
to their low chemical functionality (as for static structures, see above), their main 
restrictions are the difficulty of operating them continuously and autonomously and 
their relatively low speed. If one considers DNA nanotechnology as an approach to 
emulate and thus understand biological self-organization, these will be major hurdles. 


3 From Self-Assembly to Non-equilibrium Dynamics 
and Self-Organization 


Realizing dynamic, biology-inspired self-organizing systems, and building molecu- 
lar machinery in particular, involves the challenge of controlling the flow of chemical 
energy through a non-equilibrium chemical system to generate interesting dynamics 
and structure (cf. Fig. 1). 


3.1 Molecular Machines 


Molecular machines and devices made from DNA have usually been based on DNA 
conformational changes (formation of duplexes, hairpins, triplexes, i-motifs, etc.) that 
take place in the presence of a DNA input or in response to a change in environmental 
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Fig. 1 DNA nanotechnology has very successfully generated nanostructures, which self-assemble 
as the thermodynamically most stable structure of a mixture of components. Various molecular 
machines and devices have been realized by periodically driving switchable molecular structures 
out of equilibrium in a non-autonomous manner. In the future, DNA nanotechnology is anticipated 
to increasingly operate further away from equilibrium, which will be of interest in various con- 
texts: accessing far-from-equilibrium self-assembled and dissipative structures; driving molecular 
machines continuously and autonomously; realizing active matter and cell-like soft robotic systems 
that can respond to their environment and quickly switch between different states of organization. A 
specific challenge is the production and operation of nucleic acid devices inside living cells. In the 
figure, A denotes some parameter that characterizes the deviation of the system from equilibrium 
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conditions (salt, pH, light). Due to the nature of these stimuli, most such systems are 
operated in a clocked manner—they are kicked out of equilibrium by a change in 
conditions (addition of fuel), followed by relaxation to a new equilibrium state. 

Continuous and autonomous operation out of equilibrium is challenging. An 
ingenious idea to achieve this was developed by Turberfield and Yurke and fur- 
ther developed by many others [22, 23]. DNA fuel can be forced into an unreactive 
secondary structure (like a hairpin loop), which can be activated via strand invasion 
by a DNA trigger strand. The activated sequence domains can then invade another 
inert fuel hairpin (or secondary structure). This principle can be used to store chem- 
ical (hybridization) energy in hairpin loops, which can be released in a dynamical 
DNA system in catalytic cycles or cascades. In essence, this is an example of kinetic 
pathway engineering: The inert fuels are present in a metastable state, which can only 
relax through reactions that drive a process of interest [23]. This principle has been 
used to drive autonomous molecular walkers and also lies at the heart of a variety of 
nucleic acid amplification schemes [24]. 

Even though these achievements are extremely impressive, the field of DNA 
machines has not yet “taken off,’ and the capabilities of DNA nanomachines are 
still far from those of biological machinery. DNA-based molecular machines are 
slow, they have not been used to exert appreciable forces to carry out tasks or move 
objects over larger distances. Why is that? 

Using DNA as a fuel is one of the major conceptual developments in DNA 
nanotechnology—DNA is an information-bearing fuel and can simultaneously act 
as a molecular address code that switches only specific molecular processes. As 
can be seen from the applications that have emerged, this is particularly useful for 
biosensing applications and for the realization of chemical reaction networks. DNA 
fuels have not been very successfully used, however, to generate fast movements and 
quick conformational changes as found in biological molecular machines. 

Biology does not use an information-bearing fuel molecule—it uses a couple of 
small molecule fuels (ATP, GTP) for almost everything. These molecules are present 
at high concentrations (in the mM range), are constantly (re)generated by cellular 
metabolism, and shared by a large diversity of different processes that run in parallel. 

Using small molecule fuels at high concentrations results in much faster kinetics. 
Millimolar concentrations are much higher than the typical concentrations used for 
DNA fuels, with an accordingly higher on-rate for the fuel.” Furthermore, binding of a 
small molecule to its binding site is faster than binding between large molecules with 
multiple interactions or complex interaction surfaces—DNA hybridization requires 
nucleation of a few base-pairs “at the right position,” for which the participating 
DNA molecules must be in the appropriate orientation to each other. Of course, also 
the off-rate is affected: While a small molecule with a comparatively small AG for 
binding will quickly dissociate, we have to use branch migration or other means to 
wrest off a DNA fuel molecule from a DNA machine to start another machine cycle. 


? Experiments with optimized, “leakless” strand displacement systems have demonstrated operation 
at concentrations of up to 10 uM [25]. 
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Other aspects that play a role in biological machines and may guide the further 
improvement of artificial machines relate to the coupling of fuel consumption to 
conformational changes. In biology, sometimes only a small change—binding of ATP 
into a tight pocket or changing a single charge by phosphorylation—is transduced 
mechanically to a large change in geometry. DNA nanotechnology has not achieved 
this level of molecular control yet. It remains to be seen whether DNA—even in 
chemically modified form—can be used for such mechanisms at all. 

If we want to stick with chemical driving mechanisms rather than physical manip- 
ulation of DNA nanodevices with magnetic [26] or electrical fields [27], or light [28— 
30], we will also need to figure out a mechanism to provide high-energy fuels for 
longer periods of time. If we do not want to use manual or microfluidic addition of 
fuels and removal of waste [31], we will need to embed the DNA devices within a 
reaction cycle that generates the fuels—in other words, some kind of metabolism is 
required. This probably cannot be achieved with DNA alone. 

Potential tasks for building future molecular machines based on DNA might thus 
be the following: (i) find the “ATP equivalent” for DNA nanotechnology. It would 
be great to have a universal fuel that is used by all kinds of DNA machinery (which, 
of course, would mean that we have to give up sequence-addressability through the 
fuel); (ii) find a way to couple fuel consumption to a quick conformational change; 
(iii) realize some kind of metabolism that generates fuel at high concentrations or 
find some other clever way to replenish it. 

What will be the role of DNA programmability in this context? 


3.2 Non-equilibrium Chemical Dynamics and Self-Assembly 


Similar considerations as for molecular machines apply to dynamic and growing 
biological structures and materials in general. Structures in biology are rarely built 
to last. There is a constant turnover of molecules within supramolecular assemblies 
like the cell membrane or the cytoskeleton, which also gives biological systems the 
ability to respond to their environment, to grow, change shape, move, etc. 

For instance, treadmilling of cytoskeletal filaments [32] involves addition of new 
monomers at one end, consuming ATP or GTP, and dissociation at the other end, 
which results in an apparent movement of the filaments. Several groups have suc- 
ceeded in generating polymers from DNA tiles or helix bundles [33-35], but the poly- 
merization process was usually based on DNA hybridization and was not coupled to 
consumption of a high-energy fuel. In biology, the growth of filaments can actually 
exert forces on membranes (in addition to the Brownian ratchet force generated by 
growth of the filaments, molecular motors are involved [36]). It is currently unclear, 
how the slow growth of DNA filaments could be applied for something similar. At 
this point, DNA nanotechnology can provide the building blocks, but DNA-fuelled 
processes do not yet compete with the ATP/GTP-driven processes involved in the 
formation of non-equilibrium structures. 
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There are opportunities, obviously, in fusing ATP consuming proteins (or other, 
potentially synthetic catalytic units) to DNA structures and thus generate molecular 
systems with active dynamics. Here the DNA components will provide the spa- 
tial organization, while the dynamics are driven by some other chemical process, 
which can be supplied with fuel more easily. In principle, one could also use cellular 
metabolism itself to drive DNA or RNA-based systems and thus solve (or circumvent) 
the problem of energy supply and creation of non-equilibrium conditions. Several 
examples of toehold-mediated strand invasion processes have been demonstrated in 
vivo, where all of the components were generated by transcription reactions. This 
so far mainly refers to switchable regulatory RNA molecules (toehold switches [37] 
or conditional guide RNAs [38, 39]), but recent developments (still in vitro) also 
suggest a route to realize more complex RNA strand displacement circuitry from 
transcription products [40, 41]. 

Somewhat intermediate in this context is the utilization of DNA or RNA poly- 
merases, ligases, and nucleases that can be adopted to power dynamic nucleic acid- 
based systems in vitro [42—44]. These enzymes accomplish the production and degra- 
dation of RNA or DNA fuels consuming nucleotide triphosphates and can therefore 
keep a system out of equilibrium as long as the NTPs are supplied and waste products 
are removed. Enzyme-driven systems retain much of the sequence programmability 
of pure DNA systems and have been used to create bistable systems and oscilla- 
tors [42, 43, 45, 46], pattern forming systems [47, 48], transient dynamics [44, 49, 
50], or to control assembly/disassembly reactions [35]. In contrast to dynamic DNA 
systems based on hybridization interactions alone [51], however, enzyme-driven sys- 
tems have to cope with the biochemical idiosyncrasies of the enzymes used and are 
intrinsically less programmable. 

In spite of the limitations mentioned, itis still conceivable to extend dynamic DNA 
nanotechnology to certain far-from-equilibrium processes and assemblies based 
exclusively on DNA and thus to create assemblies that are not the thermodynamically 
most stable ones. For instance, control of kinetics has previously been shown to be 
important in the context of origami folding and allowed to direct the folding process 
toward one of several alternative possible structures[52, 53]. Kinetics of strand dis- 
placement reactions can further be tuned via the length of the toehold [54], or by using 
tricks such as remote toeholds [55], which allow the engineering of kinetic pathways. 
“Handhold-mediated strand displacement” has just recently been demonstrated to 
enable the formation of templated far-from-equilibrium DNA assemblies [56]. 


3.3 Robots 


Of course, speed is not everything. Many applications will not require fast move- 
ment or response. For instance, if we just want to arrange molecules into specific 
geometries, static DNA nanotechnology is sufficient. Also growth and pattern for- 
mation processes do not strictly have to be fast—think of emulating plant growth or 
programming the development of an organism. Where speed probably matters, is in 
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robotics [27, 57—63]. At least in my interpretation, robotic systems should be able to 
interact with their environment—sense, make decisions, respond, and act on their sur- 
roundings. In order to be able to do that sensing, computation, and action have to take 
place on a relevant timescale. In this context, it may be useful to adopt an engineering 
approach toward molecular robotic systems. We need to identify tasks for the robots 
and define desired performance characteristics and benchmarks. Very likely, for many 
applications it will not be possible to reach the aims set for the robots using DNA 
alone—any physical or chemical trick to speed up the systems will here be welcome. 

Another challenge for the realization of DNA-based molecular robotics is the inte- 
gration of different robotic components into a consistent system. DNA nanodevices 
have already been shown to be capable of sensing, computation, and movement, 
but these have been combined into a molecular robotic system only in a few cases. 
Rather than programming DNA robots at the sequence level, programming at a mod- 
ular level could be interesting. Taking inspiration from macroscopic modular robots, 
one could strive for generating optimized DNA robotic components with standard- 
ized interfaces that allow the reuse of known functional DNA modules. 

Already now, many researchers like to apply the same type of DNA origami struc- 
tures (think of Paul Rothemund’s rectangle [8]) to many different tasks—simply 
because they are useful and have been proven to work—generating ever more differ- 
ently shaped structures is not necessary for many applications. Further, using local- 
ization and spatial organization of DNA components not only allows speeding up 
slow hybridization reactions by generating high local concentrations [62, 64—67 ]— 
it also allows reuse of the same components by spatial separation and thus avoids 
cross-talk. Thus, strategies for defined modular interfaces and spatial arrangements 
of known components might become a new branch of molecular programming. 


4 What Lies Ahead? 


DNA nanotechnology 40 years after its inception by Ned Seeman is still developing 
rapidly. Hundreds of laboratories worldwide harness the power of DNA self-assembly 
to arrange molecules and particles into specific geometries to address scientific ques- 
tions in a wide range of research areas. Due to all the favorable properties of DNA 
already mentioned above, DNA nanotechnology is here to stay—it has (or will) 
simply become a standard approach for molecular engineering, similar as lithogra- 
phy or the production of nanoparticles in nanoscience. Further progress in the field 
will require improving the chemical versatility of DNA and DNA analogues without 
sacrificing the unique capability for sequence-programmable self-assembly. 

The greatest challenges for the field lie in the realization of dynamic functions. 
Can one generate DNA-based molecular machines that are really useful? Can one 
generate complex chemical dynamics based only on DNA? Can one emulate bio- 
logical behaviors with programmed DNA nanosystems? Will it be possible to speed 
up the systems to make them useful for applications? If not, what are the general 
principles, the insights we can derive from DNA-based model systems? 
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It is likely that we will need to use some kind of dissipative process—chemical 
catalytic cycles, enzymes, cellular metabolism—or physical driving to generate inter- 
esting dynamic behaviors. The challenge, then, is to use the power of DNA nan- 
otechnology to arrange the respective active components in space and to direct the 
non-equilibrium energy flow to generate and control self-organization processes. A 
major challenge will be to develop schemes to abstract these behaviors and ultimately 
make them as programmable as static DNA structures. 

Let me end with a personal note. I am glad and grateful that I can be part of 
this great interdisciplinary adventure called DNA nanotechnology—DNA nanotech 
is a wonderful scientific community, with so many highly inspiring personalities 
from all fields of research, starting, of course, with Ned Seeman who in his talks 
always emphasized the importance of knowing everything about chemistry, physics, 
biology, math, ..., and arts. Even though mentioned several times in my text, one 
also should not overemphasize usefulness and applicability. Using DNA as a generic, 
programmable molecular substrate to explore interesting concepts and ideas “in the 
test tube” is a worth on its own. It allows you to stand back and approach problems 
in self-assembly and chemical dynamics at a more abstract level—and thus gain 
insights into their general governing principles. 
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Fei Wang, Qian Li, and Chunhai Fan 


Abstract The first demonstration of DNA computing was realized by Adleman 
in 1994, aiming to solve hard combinational problems with DNA molecules. This 
pioneering work initiated the evolution of the field of DNA computing during the 
last three decades. Up to date, the implemented functions of DNA computing have 
been expanded to logic operations, neural network computations, time-domain oscil- 
lator circuits, distributed computing, etc. Herein, the history of DNA computing is 
briefly reviewed, followed by discussions on opportunities and challenges of DNA- 
based molecular computing, especially from the perspective of algorithm design. 
Future directions and design strategies for next-generation DNA computing is also 
discussed. 


1 A Brief History of DNA Computing 


Nature-evolved DNA molecules are the primary information-carrying medium of life 
[1]. The computing power of DNA relies primarily on its structural potential. In 1953, 
Watson and Crick first proposed the double-helix structure of DNA, which marks a 
key step to uncover the secret of life [2]. In 1982, Seeman for the first time proposed a 
rational design of Holliday junction-like branched DNA structure [3], pioneering the 
endeavor to construct human-defined structures using DNA beyond the secret of life. 
This work and the subsequent progress in DNA nanotechnology provide insights and 
toolbox for the design of dynamic structures to implement computing algorithms. 
In 1994, Adleman proposed a DNA-based algorithm to solve a Hamiltonian path 
problem [4]. This work for the first time demonstrated the feasibility of carrying out 
computations using synthetic DNA molecules, thereby signaling the start of DNA 
computing. The parallelism far beyond that of conventional silicon-based computers 
attracted wide interest. Following this work, efforts were made to explore the parallel 
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computation ability of DNA molecules to solve various mathematically complex 
problems, including Boolean satisfiability problem (SAT) [5-9], maximal clique 
problem [10], traveling salesman problem [11], etc. During this period, it was real- 
ized that the available molecular parallelism was not enough to combat the slow 
clock speed of biochemical operations and the redundancy required to combat the 
high intrinsic error rate of the operations. Evolutionary computation models were 
proposed to overcome the limitations, although not experimentally demonstrated 
[10, 12-15]. 

Subsequently, a variety of new molecular mechanisms for implementing compu- 
tational algorithms were proposed, which greatly enriched the toolbox of DNA 
computing [16-24]. In 2000, as a milestone in DNA computing, an enzyme-free 
DNA “tweezer” was proposed by Yurke et al., which could switch between ON 
and OFF states through strand displacement reactions [25]. Based on accurate base 
pairing principle, with tunable reaction kinetics [26-31] and spontaneous execution, 
strand displacement reactions immediately enable various modular design of DNA 
computing architectures [16, 18, 32-34]. In 2004, Dirks and Pierce realized triggered 
amplification by hybridization chain reaction (HCR) [17], which has been broadly 
applied in computing [35], biosensing [36], and self-assembly [37, 38]. In 2006, 
Seelig et al. proposed a toehold-mediated strand displacement scheme to construct 
enzyme-free logic circuits [18]. In 2007, Zhang et al. developed a signal amplification 
reaction network that uses DNA strand as catalyst [39]. 

With the advances of DNA computing toolbox, various computational functions 
(e.g., automaton [40, 41], logic computing [42, 43], neural network computing [44, 
45], cargo-sorting [46], and maze solving [35]) have been realized. Figure | presents 
a timeline of representative advances in DNA computing, classified mainly according 
to the realized functions and design principles. In 2003, Benenson et al. reported a 
molecular automaton that uses DNA both as data and fuel [40]. In 2010, Pei et al. 
first developed a programmable computing device to play a game [41]. In 2011, 
Qian and Winfree proposed a simple yet modular computing unit “seesaw” motif, 
with which large-scale digital computation [42] and neural network computation [44] 
were implemented experimentally with improved performance. In 2012, Padirac et al. 
developed switchable memories using DNA and DNA processing enzymes [19]. In 
addition to state jumps, they also implemented time-domain programming. Recently, 
temporal dynamics programming was further developed to implement predator-prey 
reaction network [47], adaptive immune response simulator [48], and enzyme-free 
oscillators [49]. Logic computing has also been developed with the introduction of 
DNA origami-based logic gates [50], spatially localized logic gates [33], integrated 
gene logic chip [51], single-stranded gates [52], and DNA switching circuits [53]. 
Meanwhile, task-oriented DNA molecular algorithms have been demonstrated in 
recent years, such as edge detection [54], cargo-sorting [46], maze solving [35], and 
pattern recognition [45]. 
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Fig. 1 Timeline of representative advances in the field of DNA computing [4—10, 16-19, 21, 25, 
30, 32, 33, 35, 39-41, 44, 47-56] 


2 Opportunities and Challenges 


Highly ordered arrangement of physically addressable computing units at the 
nanoscale facilitates high-performance computing of silicon-based conventional 
computers. To complete a calculation task, the corresponding electronic transmission 
paths are activated by addressing to map the algorithm to the hardware. In contrast, 
DNA computing functions are mainly implemented with interactions between DNA 
and DNA [18], other biological molecules (e.g., RNAs [57], proteins [19, 51], and 
small molecules [58]), or environmental conditions (e.g., light [59], pH [60], and ions 
[61]). These molecule computing units are addressed chemically rather than physi- 
cally, since they are mixed in solution with indistinguishable locations. To carry out a 
required computing task, a DNA-based chemical reaction network is programmed to 
run according to specified rules (algorithms) by designing and controlling the inter- 
actions between DNA and above elements. The differences in underlying implemen- 
tations between DNA and solid-state computing devices (e.g., electronic and optical 
computers) result in unique advantages and challenges for further development of 
DNA computing. 


2.1 Bridge Between Matter and Information 


Computing relies on information processing, transfer, and storage. For conventional 
storage media, information is stored in the specified (magnetic, optical, electrical) 
states of matters by spatial manipulation of these storage media. In contrast, the aperi- 
odic nucleobases make a DNA strand itself a piece of information. This not only leads 
to ultrahigh information density in DNA, but also bridges matter and information. 
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With aperiodic nucleobases and regularly sized hydrogen bonds, DNA double helix 
is an aperiodic crystal, which supports reliable information storage and transfer [62]. 
The information stored in DNA is transferred into RNA and proteins through genetic 
transcription and translation. The stereoregular duplex structure allows DNA to be 
accessed and processed by sequence-independent enzymes for self-replication and 
degradation. Thus, DNA links a wide matter world at the molecular level with the 
sequence information it carries. 


2.2 Massive Parallelism 


DNA computing was initially proposed to solve mathematically complex prob- 
lems, such as Hamilton path problem [4], SAT problem [5], and maximum clique 
problem [10], taking advantage of specific and highly parallel binding between 
DNA molecules. Due to the stochasticity of molecular interactions, every individual 
molecule of the same population randomly follows certain permitted reaction paths. 
According to the law of large numbers, all possibilities are covered as long as the 
number of molecules is sufficient. Compared with algorithms that search for every 
possible combination one by one until the answer is found, DNA-based molecular 
algorithms can greatly reduce the time complexity. 

It should be noted that the parallelism underlying molecular interactions relies 
on the availability of participating molecules. For example, a 500-node traveling 
salesman problem has more than 10!° potential solutions, which is beyond the 
number of available molecules, as 1 L of 1 M solution could only provide 6 x 10” 
manipulative molecules. Besides, as proposed by Back et al. [12], a huge number 
of DNA molecules can participate a calculation in parallel to generate a random 
population of candidate solutions, followed by a filtering step to remove all DNA 
molecules not representing a solution to the problem. Such “filtering approach” 
becomes infeasible as the problem size grows, since it becomes difficult to select 
a small number of answer products from a large number of non-answer products. 
Theoretically, this limitation of parallelism in problem size could be overcome by 
evolutionary algorithms [12, 13, 63]. 


2.3 Scalability 


Conventional computing devices are based on the integration and spatial arrangement 
of same building blocks. For these systems, scaling is realized by integrating more 
building blocks. In contrast, DNA computing relies on specific interactions of orthog- 
onal sequences. The 4-base coding nature and base pairing rule of DNA sequences 
support a rich orthogonal molecule library for large-scale computing. However, with 
the increasing number of required molecule types, the Hamming distances between 
DNA strands become smaller, leading to transient or stable unwanted binding of 
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DNA strands [15]. In addition, molecules participating in a basic function need to 
search for each other through diffusion. Increasing molecule types will increase the 
chance for a molecule to bind to an unwanted molecule and thus decrease the prob- 
ability to be found by its target molecule, which would result in the decrease of 
computing speed together with increases of leakage. Therefore, scalability of DNA 
computing is limited to a certain finite size that cannot be extended by simply adding 
more computing units. 


3 Directions for Future Development and Potential 
Approaches 


3.1 Scaling-Up 


The number of computing elements in a computing system determines its executable 
program complexity. Similar to other computational machines, the improvement 
of circuit scale is an important direction of evolution. The first modern electronic 
computer ENIAC contained 18,000 electron tubes [64]. Nowadays, tens of billions of 
transistors are integrated into an everyday mobile phone chip. For DNA computing, 
the effective scope is still solving problems that contain a small number of nodes or 
variables, using less than a few hundred participating DNA strands. Therefore, circuit 
scaling-up is an important requisite for functional evolution of DNA computing, 
which raises challenges including: (1) the specificity and efficiency of molecular reac- 
tions deteriorated with the increase of participating molecules; (2) the lack of a scal- 
able computing architecture to realize the automatic mapping of complex algorithms 
to hardware implementations. For scaling up DNA computing, several strategies have 
been proposed and further explorations could be worthwhile. 


3.1.1 Spatial Separation 


A whole reaction can be split into different compartments by spatial separation. As a 
result, the molecules from different compartments are restricted from meeting each 
other; therefore, the reactions can be carried out efficiently in each compartment. 
Both semiconductor circuits and cells use the spatial separation strategy to control 
the material and information transmission pathways to complete complex computing 
tasks. In DNA computing, spatial separation has also been explored [65—68]. In 
2011, Chandran et al. proposed a theoretical framework to implement parallel and 
scalable computation, using localized strand displacement reactions on the surface of 
DNA nanostructure [65]. In 2016, Genot et al. realized simultaneous observation of 
10* reactions, by encapsulating a computational reaction system with various input 
conditions into droplets [67]. The molecules in each droplet reacted independently, 
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Fig. 2 Three types of spatial separation that may inspire design of DNA computing systems 


since the oil film blocked molecular communication between droplets (Fig. 2a), 
which enabled parallel and high-resolution mapping of circuits. 

Communicative separation allows different spatial positions to store different 
chemical information with input/output communication. In 2019, Joesaar et al. encap- 
sulated basic logic blocks into proteinosomes as protocells to mimic the function of 
natural cells [55]. As the output of one reaction, the released DNA strands in one 
protocell could pass through protocell membranes, diffuse in solution, and then enter 
another protocell, triggering the reactions in the downstream as input strands. This 
approach was supported by protocell—protocell communications (Fig. 2b), which 
holds potential for high-performance DNA computing with distributed systems. 
However, the computation demonstrated so far is limited to several protocells with 
single logic gates inside. Circuit size in each protocell and communication efficiency 
between protocells remain to be explored. 

The addressability of DNA origami at nanometer precision facilitated the confine- 
ment of molecule reactions on DNA origami surface, which enables templated sepa- 
ration of reactions on different origamis (Fig. 2c). In 2017, Chatterjee et al. proposed 
spatialized information propagation by using DNA origami as the canvas to design 
circuits [33]. The circuit on DNA origami receives input signals and fuel strands from 
the solution and releases output strands for readout, allowing signal communica- 
tion between origamis. However, circuits across multiple origamis via inter-origami 
communication have not been realized. Due to the lack of threshold and amplifier 
functions in such spatial-separated systems, this strategy still faces challenges when 
increasing scale of the circuit. DNA self-assembly may provide alternative solu- 
tions to construct localized response elements with noise suppression and signal 
amplification functions. 


3.1.2 Combination of Order and Disorder 


Collision events of DNA molecules in solution are disordered, which brings a high 
degree of parallelism to DNA computation together with inaccuracy at some extent. 
The parallelism means subdividing a deterministic space into a probability space for 
one calculation. Taking the search algorithm as an example, in sequential computing, 
each time a possible path is explored, and a specific output is generated. For parallel 
DNA molecular reactions, all possible paths are explored simultaneously. As the 
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Fig. 3 Schematic 
illustration of high-level 
ordered and low-level 
disordered architecture 


number of possible reaction paths increases, detectors with higher sensitivity and 
accuracy are required to obtain the solution for a problem. In addition, as the number 
of molecules increases in the system, the probability for a single molecule of being at 
the state of non-specific binding is higher, which will consequently reduce the speed 
and probability along the correct reaction path and increase the probability of signal 
leakage. 

Living systems provide unique examples for coordinating reactions involving a 
large amount and variety of molecules. In cells, spatial compartments are utilized to 
confine reactions into small reaction containers. With a certain degree of fluidity, the 
skeletal structures of cells provide a heterogeneous environment for disordered reac- 
tions. Cells aggregate to form ordered tissues, organs, and finally the organisms. With 
the combination of disorder and order, organisms have evolved advanced computing 
capabilities (e.g., learning, thinking, and decision-making). High-precision manu- 
facturing technologies, including DNA nanotechnology, hold potential to build 
highly ordered containers for DNA molecular reactions. The ordered organization of 
computing modules, together with the disordered molecular reactions, will provide 
high parallelism and overall coordination to the computing system (Fig. 3). It is 
possible to develop more complex artificial DNA-based computing systems in vitro 
with improved synthetic intelligence. 


3.1.3 Reversible and Directional Reaction 


Currently, the generate-and-test approach is the most widely utilized one to experi- 
mentally demonstrate DNA computing process. When the scale of potential solutions 
exceeds the amounts of available molecules, a problem becomes theoretically infea- 
sible. In fact, the faithful readout of the result is also limited by the proportion of 
correct calculation result. A solution to a problem could be viewed as a correct 
assembly of DNA molecules, and a high yield of the correct DNA assembly will be 
of benefit to the filtration of correct answer. If the yield is too low, a solvable problem 
may be misinterpreted as no solution. Condon and coworkers proposed a strategy 
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Fig. 4 Schematic illustrations of irreversible search algorithm (a) and reversible search algorithm 
(b) for a maze 


using reversible strand displacement reactions for DNA computations that are space 
and energy efficient [69]. Thubagere et al. proposed a random walk-based algo- 
rithm for cargo-sorting [46], in which a single-stranded DNA robot picks up cargo 
irreversibly through toehold-mediated strand displacement reactions. Carrying the 
cargo, the robot performs a reversible random walk among adjacent tracks via toehold 
exchange until reaching the goal track for cargo drop-off. 

The reversible motion strategy mentioned above could be extended to solve 
complex optimization problems. Taking a maze-solving problem for example, 
assuming the probability of stepping in each allowed direction is equal, the proba- 
bility for a navigator to reach the exit of the maze shown in Fig. 4a is 1/3456. As each 
individual navigator randomly follows a correct or wrong path for the maze solution, 
it only requires 16 steps to reach the exit (Fig. 5a). However, the probability suggests 
that only ~ 0.03 nM correct assembly would be obtained with 100 nM navigators, 
making it hard to detect the correct solution. Using a reversible search algorithm, the 
navigator could return to a node that has been visited, while its last step to exit is irre- 
versible (Fig. 4b). Through this approach, there is no dead end except the exit; thus, 
every navigator is capable of reaching the exit. In a simulation, more than 50% navi- 
gators reached the exit within 1000 steps (Fig. 5b). With possible repeated visit of an 
intermediate node, it takes remarkably more time to reach the exit for the reversible 
navigator than the irreversible one. However, the arrived percentage of reversible 
navigators exceeds that of irreversible navigators in 100 steps (Fig. 5c). Sacrificing 
time for higher success probability may provide an approach. For a reversible system, 
time sacrifice will probably lead to higher success probability with higher yields of 
correct DNA assemblies, which may provide solutions to complex tasks beyond the 
practical computing power. 


3.1.4 More Efficient Molecular Searching Modes 


In homogeneous solutions, molecules recognize each other mainly through diffusion. 
This is why DNA computing systems constructed from diffusive building blocks 


The Evolution of DNA-Based Molecular Computing 39 


a -3 b c 


O, 
5 
o 
in 


> 
o 
D 


o 
h 


Longer time 
— 


Arrived percentage 
N v 

Arrived percentage 
o 
v 

Arrived percentage 

o 
=] 
8 


Improved probability 


o 


Irreversible 


o 
o 


0 á 
0 5 10 15 20 25 30 35 0 200 400 600 800 1000 0 40 80 120 160 200 
Steps Steps Steps 


Fig. 5 Simulated overall success rates of maze solving. a All navigators that could reach the exit 
undergoes 16-step propagation in the irreversible algorithm. b Arrived percentage increases with 
step numbers in the reversible algorithm. c Comparison of arrived percentage in the two propagation 
modes within 200 steps 


usually face a limitation in reaction rates toward the correct pathway. Commonly, tens 
of hours are needed to finish calculation when hundreds of DNA strands are involved 
[45]. Fast computing under low DNA concentrations could be realized by introducing 
new molecular searching modes. As a successful demonstration, calculations were 
completed in minutes under nanomolarity concentrations [33], by using DNA origami 
to confine the diffusion of each computing element into nanoscale regions. 

Inspiration may also be obtained from the natural systems. In the cellular environ- 
ment, the searching process of a protein toward its target DNA segments generally 
involves complex motions, including sliding, hopping, and intersegmental transfer 
as well as diffusion [70] (Fig. 6). When the DNA strand is stretched, protein prefers 
1-dimensional lateral search. When the DNA strand is coiled, which is the native 
state, proteins can transfer between spatially adjacent segments. Combining these 
searching modes, even at very low concentrations, proteins can realize fast target 
search throughout the whole cell. Learning from these natural molecular searching 
modes, high-performance DNA computing may be developed based on novel molec- 
ular interaction mechanisms. The accurate spatial addressing property of DNA 
nanotechnology and tunable mechanical rigidity of DNA nanostructures may offer 
novel scaffolds to construct molecular machines with new molecule motion modes. 
With these molecular machines, more fast and efficient molecular recognition may 
be possible, which would realize the increase of the executable program complexity 
of DNA computing systems. 


3D diffusion Sliding Hopping Intersegmental transfer 


Fig. 6 Possible DNA searching modes of proteins in cells that may inspire more efficient molecular 
interactions for DNA computing in solution 
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3.2 Updating and Reusing 


As a natural computer, life accumulates memories from past experiences, renews, 
and upgrades itself. Electrical computing chips also update their states either trig- 
gered by periodic clock or aperiodic pulse signals. Most proposed DNA computing 
devices are disposable and cannot be reused, because computing elements are perma- 
nently destroyed after calculation. Therefore, DNA computing is less developed in 
this respect, and developing renewable and reusable circuits will greatly expand the 
application scope of DNA computing. 


3.2.1 Hardware Resetting 


In DNA computing, the reset of the hardware state is based on chemical reactions, 
which is an important challenge that limits the sustainable use of computing devices. 
Spontaneous chemical reactions follow the tendency of energy change. By adding 
new strands to trigger reverse strand displacement [71], or by using the action of 
nicking enzyme to change the energy state of the system [72], the reaction can be 
reversed, and the input signal can be degraded, thereby realizing the reset of the 
computing device. Although resettable circuit implementations have been validated, 
the recovered concentration of input strands for next computing cycle reduced rapidly, 
making the circuit incapable of recycling. To perform sequential operations like elec- 
tronic computers, further exploration on the design of a resettable DNA computing 
system is needed. Meanwhile, in combination with the parallelism of molecular reac- 
tions, highly parallel computing within one clock cycle may be developed. In this 
direction, to improve the reset efficiency, it is necessary to simplify the molecular 
structure design with further understanding of the underlying mechanisms. 


3.2.2 Iteration and Update of Molecular Reaction Networks 


Neuromorphic computing empowers artificial devices to learn from new inputs and 
realize self-update. Recently, DNA circuits-based neural network computing has 
been demonstrated [44, 45, 57]. However, the weight values of these neural networks 
were trained in silico. This one-time-use feature makes DNA circuits unable to renew 
themselves and thus uncapable of learning. 

Evolutionary DNA algorithms have been proposed to overcome parallelism limi- 
tations by dividing the selection of the final answer from a single huge pool into 
recursive selections from various small pools [12, 13]. For example, in Systematic 
Evolution of Ligands by EXponential Enrichment (SELEX), a destructive process is 
performed to remove intermediate results that do not fit the constraints of a problem. 
These iterative selections allow evolution of computing results to approach the solu- 
tion, which is different from the Adleman-style computing with a single selection 
at the final step [73-76]. The evolutionary strategy also provides possibilities to 
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achieve molecular model training in machine learning. Rondelez’s team has devel- 
oped a series of evolutionary DNA reaction networks using a toolbox of DNA 
processing enzymes, i.e., polymerase, nickase, and exonuclease [19, 47, 77—79]. With 
a rich library of DNA processing enzymes to generate, transfer, and degrade DNA 
signals, it is promising to experimentally implement more complicated evolutions 
with DNA-based reaction networks to mimic biological systems in vitro. 


4 Summary 


DNA computing has evolved, slowly but steadily during the last 30 years. Despite the 
remarkable progress, challenges remain in many facets, such as function diversity, 
feasible circuit size, and computing efficiency. In particular, DNA computing relies 
on molecular diffusion and recognition of DNA molecules, which is fundamentally 
different with conventional and other type of computing systems that use a universal 
signal (e.g., electron or photon). We envision that next-generation DNA computing 
with molecular intelligence may evolve with inspiration from both natural living 
systems and electronic computers. 
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Abstract In this essay, the evolution of DNA nanotechnology research in Japan to 
date will be reviewed. The expansion of the research community in Japan and the 
trends in regard to the selection of project themes will be elucidated, along with the 
identification of the researchers who participated in these projects. Some aspects 
of the research history of the author, who entered from the field of robotics, are 
introduced, as this information may be of interest to young students and researchers. 


1 Introduction 


In 1982, when Professor Seeman began his research on DNA nanostructures, there 
were no studies of this kind in Japan. In 1994, Adleman’s work on DNA computers 
was published, and two years later, Masami Hagiya of the University of Tokyo 
initiated a research project on molecular computing [1]. Using this as a starting 
point, the history of DNA nanotechnology in Japan has evolved over approximately 
25 years. Hagiya would then lead research in Japan for the next 20 years. Figure 1 
shows the genealogy of related projects in Japan. 

Early projects explored the possibility of massively parallel computation with 
molecules, and in 2001, a project on molecular programming [2] began, which 
included researchers from disciplines such as mechanical, materials, and medical 
engineering. These scientists introduced the idea of extending the function of the 
system beyond “computation,” to sensing of the environment and acting accord- 
ingly. After several years of preparation, the project of “molecular robotics” was 
launched in 2012 [3]. 

There are several reasons why the concept of “molecular robots” originated in 
Japan. One is that Japan is a country where robotics research is very active, and 
there are a large number of researchers, in both the theoretical and applied fields. 
There is also a latent familiarity with the word “robot” through various depictions 
in cartoons and animations. On the other hand, research in chemistry, especially 
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Fig. 1 DNA nanotechnology-related projects in Japan 


organic chemistry and polymer science, is also active in Japan, as is evidenced by 
the fact that the country has produced a number of Nobel laureates in chemistry. 
Many researchers in these fields are working on specific applications, while others 
are attempting to create new molecules out of scientific curiosity, with the expectation 
that the application of their molecules will be beneficial in the future. The author 
believes that these types of researchers may have been attracted by the concept of 
“building robots with molecules.” 

During the five years of the project, several prototypes have been developed. The 
amoeba-type molecular robot contained various molecular devices that were encap- 
sulated in artificial cells. In the gellular automaton, programming of a molecular 
computing system on a gel medium was investigated. In parallel with the progress of 
the project, a community of researchers in molecular robotics was organized as an 
arm of the Society of Instrument and Control Engineers (SICE), and research work- 
shops and national conferences were held regularly. In 2020, a project on “molecular 
cybernetics” was launched, with the author as its representative [4]. The goal of this 
project was to create a system with a certain information-processing capabilities by 
combining artificial cells developed through molecular robotics. 


2 How the Author Got Involved in DNA Nanotechnology 


The author specializes in robotics, specifically autonomous distributed robotic 
systems (DARS) [5]. Research on these systems focuses on flexibility and adapt- 
ability, which is not possible with conventional robots, through the coordination of a 
large number of autonomous robotic agents. The author is particularly interested in 
“homogeneous” autonomous distributed robot systems, that is, systems consisting 
of a large number of robot modules of exactly the same type. Each module is a small 
robot with a microcomputer, sensors, and actuators. (This type of robot is therefore 
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called a modular robot.) Similar to building a house with Lego blocks, the modules 
not only connect with each other to form different shapes, but also move and recom- 
bine themselves to change the shape of the whole structure without external aid. My 
group proposed the concept of such modular robots and developed several prototypes 
to demonstrate this concept [6]. For example, we have created a “self-repairing” robot 
that can automatically replace a failed module with a spare one, and a modular robot 
that can autonomously transform itself into different forms, such as that of a dog or 
a snake, and move according to these forms. 

Several other groups have also developed homogeneous distributed robots, but the 
number of modules was limited to between 10 and 100. Therefore, these prototypes 
had reduced functionality. There are various reasons why the number of modules 
cannot be increased (i.e., difficulty in creating a scalable system). For instance, each 
module must contain such things as a microcomputer, sensors, motors, batteries, and 
communication devices in a compact space, which is difficult to design in itself; it 
is challenging to guarantee mechanical, electrical, and information reliability, and, 
when possible, it comes with a large cost. Modular robots are expected to be used for 
exploration and rescue in unknown environments because of their ability to adapt to 
the conditions, but full-scale applications would require thousands or tens of thou- 
sands of modules, which is hindered with current technology. The Kilobot developed 
by Nagpal et al. is a system with 1000 modules and is probably the largest modular 
robot built to date [7]; however, each module is only able to move on a plane, and 
they cannot be connected to each other. 

In 2000, when I moved from the National Mechanical Engineering Laboratory 
(MEL) to the Tokyo Institute of Technology, I was unsure if I would continue my 
research on modular robots. I had been simulating the formation of networks of 
various shapes using a model in which mass points were connected by virtual springs 
[8]. I learned that it is possible to make a jigsaw puzzle of molecules using DNA, 
that is, they could self-assemble [9]. I had always been interested in the term “self- 
assembly” and even called my modular robot a “self-assembling machine” [10]. 
Therefore, I began studying DNA tiles with my students and developed a method 
to improve the reliability of algorithmic self-assembly, which had been proposed by 
Winfree and others at that time [11]. Subsequently, I learned directly from Professor 
Winfree and gradually entered this field. 

Many mechanical engineers have changed their research subjects significantly 
in the course of their research life. My personal impression is that approximately 
50% alter their research interests at some point. Professor Klavins at the University 
of Washington is another scientist who moved from the field of modular robots to 
that of bio/nanotechnology. He has been working on modular robots that operate 
stochastically [12] (his system of randomly moving pizza piece-like modules on an 
air-hockey table to make perfectly round pizzas is fascinating to watch). He is now 
a synthetic biologist. 

Young people who are about to start their research should be prepared for the 
possibility that they may completely change their research field in the future. This 
does not mean starting the research from scratch, as the methodologies acquired up 
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to that point are often applicable for different research areas. Changing their focus 
expands their futures by broadening the range of methodologies available to them. 


3 The Evolution of Projects in Japan 


Let us return to the research projects in Japan. First, I would like to briefly explain the 
scientific research system in Japan. Most basic research in the sciences, engineering, 
and humanities is funded by public grants from the government. The main subsidy 
is a Grant-in-Aid for Scientific Research (KAKENHI) provided by the Ministry 
of Education, Culture, Sports, Science, and Technology (MEXT) [13]. In addition, 
the Japan Science and Technology Agency (JST), affiliated with MEXT, and the 
New Energy and Industrial Technology Development Organization (NEDO), affili- 
ated with the Ministry of Economy, Trade and Industry (METI), also provide large 
research funds, but these funds are only for application-oriented projects. 

In the field of DNA nanotechnology, KAKENHI is the main source of funding. 
There are various levels of KAKENHI, ranging from a few million yen per year 
for a small number of researchers to several hundred million yen per year for a few 
dozen researchers. All of these are competitive research funds, and each obtained 
with different levels of difficulty. As expected, projects with larger budgets have 
more difficulty obtaining these grants, and often the largest schemes are awarded 
after years of repeated applications. In particular, the KAKENHI scheme, which is 
awarded to groups of several dozen researchers (the scheme name has changed from 
“Priority Area Research” to “Innovative Area Research” to “Transformative Area 
Research,” but the content is similar), aims to establish a new academic community 
in the field. By following this transition, we can observe the trend of research in 
Japan. 


3.1 The 1980s and 1990s 


In the 1980s, DNA research in molecular biology was active in Japan, but no projects 
in relation to DNA nanotechnology had yet been initiated. DNA nanotechnology can 
be divided into two categories: structural DNA nanotechnology founded by Seeman, 
and DNA computing founded by Adleman, and considering these, DNA computing 
research actually began earlier in Japan. That is, two years after the first paper on 
DNA computing by Adleman [14], a research project on molecular computers (1996— 
2002) [1] was conducted under the leadership of Masami Hagiya at the University 
of Tokyo. Professor Adleman was well known in the field of computer science, and 
DNA computing was expected to be a massively parallel computing architecture 
that could outperform supercomputers. At that time, Takashi Yokomori (Waseda 
University) and others in the Japanese computer science community had obtained 
information about DNA computing through researchers in formal language theory. 
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In other words, expectations of molecular computation were raised at a relatively 
early stage in Japan. Senior researchers Yuichiro Anzai, in computer science (Keio 
University) and Shigeyuki Yokoyama, in structural molecular biology (University of 
Tokyo), who were in a position to select new projects under the “Research for the 
Future Program”, chose Professor Hagiya, who had been working on a follow-up to 
Adleman’s experiment, as the leader of the project. The influence of both the bottom- 
up interest of young researchers and the top-down direction of senior researchers at 
the beginning of this field in Japan is evident. 

The molecular computer project [1] started by Hagiya et al. was joined by Takashi 
Yokomori, Akira Suyama (University of Tokyo) in biophysics, and later by Masayuki 
Yamamura (Tokyo Institute of Technology) in computer science. The SAT engine 
using DNA hairpin formation and Whiplash PCR are among the results of the 
project. The SAT engine solves the satisfiability problem known for hard combi- 
natorial problems by using the hairpin formation of DNA strands [15, 16]. Whiplash 
PCR combines the opening/closing of DNA hairpins with polymerase elongation to 
achieve state representation and state transitions in a single hairpin molecule; the 
term “whiplash” came into use subsequently [17, 18]. In the same year, 1996, the 
project of “Molecular Memory” was conducted under Hagiya [19]. In this project, 
Azuma Ouchi (Hokkaido University), Jun Tanida (Osaka University), and others 
in computer science participated, and various types of molecular memories were 
developed including “nested primer molecular memory” [20], which amplified only 
DNA with a specific sequence by four-level PCR, and a molecular memory that read 
information spatially by transduction between two secondary structures using the 
infrared laser excitation of hairpin DNA fixed on the substrate plane. Conformational 
addressing of multiple hairpin DNAs [21] was also developed in this project. 


3.2 The 2000s 


In 2002-2007, the Grant-in-Aid for Scientific Research on Priority Areas “Molec- 
ular Programming” was conducted, again with Hagiya as the representative [2]. This 
was one of the largest KAKENHI research projects, involving approximately 50 
researchers and lasting five years. In contrast to the previous research on “molec- 
ular computation,” which aimed to synthesize a desired function or structure by 
utilizing the computational potential of biomolecules, “molecular programming” 
considered the process of designing biomolecules and their chemical reactions as 
“programming” and aimed to develop a systematic programming methodology for 
molecular computation. For this project, in addition to the members of the previous 
projects, Daisuke Kiga (Waseda University) in synthetic biology, Kenzo Fujimoto 
(Japan Advanced Institute of Science and Technology) in nucleic acid chemistry, and 
the present author (Tohoku University) in robotics were involved. The first half of 
the project focused on the refinement of DNA sequence design technology, and the 
second half on its application in nanotechnology and synthetic biology. 
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For this project, Suyama et al. developed a molecular computational reaction 
system (R-TRACS) using DNA and RNA enzymes [22], and Kiga et al. created a 
bacterial computer using E. coli [23]. In addition, the author worked on the DNA 
tile error rate evaluation [24] with Winfree. At the same time, the author attempted 
to grow DNA tiles in microfluidic devices with Teruo Fujii (now President of the 
University of Tokyo). In Fujii’s laboratory, Yannick Rondelez began studying the 
reaction system using DNA enzymes in 2009, and this developed into the PEN DNA 
toolbox [25]. 

Subsequently, frequent meetings have been held, mainly by those involved in 
the molecular programming project, and a community of researchers using DNA 
nanotechnology as a tool has been formed. In June 2002, the DNA8 workshop 
was held in Sapporo (Hokkaido University), hosted by the molecular programming 
project (Professor Seeman presented a keynote lecture). 


3.3 The 2010s 


After the end of the molecular programming project, proposals were made every 
year to obtain large project research funds, but none were accepted. In 2010, the 
author’s proposal, “Development of molecular robotics by DNA nanoengineering,” 
was selected as a relatively large project under the “Grant-in-Aid for Scientific 
Research (S)” scheme. Akinori Kuzuya (Kansai University), who studied DNA 
origami in Seeman’s laboratory, and Shin-ichiro Nomura (Tohoku University) who 
was involved in artificial cell engineering participated in this project. This project 
introduced the term “molecular robotics,’ which uses a DNA nanotechnology to 
design component molecules and assemble them to create a functional molecular 
system that can respond autonomously to changes in the environment. The “Molec- 
ular Robotics Research Group” was established in the Society of Instrument and 
Control Engineers (SICE) during this time. The author served as the representative 
of this group. This research group continues, and the majority of the researchers 
involved in DNA nanotechnology and DNA computing in Japan have participated. 

In 2011, Grant-in-Aid for Innovative Field “Synthetic Biology” was implemented 
(2011-2016) with Masahiro Okamoto (Kyushu University) as the representative. 
Yamamura, Kiga, and Suyama, who participated in the molecular programming, 
moved to this group. 

In 2012, the project “Molecular Robotics—Creation of molecular robots with 
sensation and Intelligence” was adopted as Innovative Area, led by Hagiya [3]. The 
core members of this project were a combination of computer science researchers 
(Hagiya, Satoshi Kobayashi (University of Electro-Communications), Akihiko 
Konagaya (Tokyo Institute of Technology), and experimental and implementation 
researchers (the author and Hirohide Saito (Kyoto University), who was in the field 
of RNA nanotechnology). 

Generally, a “robot” can be defined as an “autonomous system” that acquires 
information from the external environment using sensors, processes this information, 
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and acts on the environment accordingly. A body (structure) is also required to 
distinguish the system from the environment and to integrate these components. The 
goal of the project was to develop technologies for the design, fabrication, integration, 
and control of molecular robots. 

An evolutionary scenario for molecular robots was proposed at the inception 
of the molecular robotics project (Fig. 2) [26]. This scenario predicts the gradual 
evolution of molecular robots over four stages: Generation 0: A single molecule 
acts as a robot. Its behavior is random due to thermal fluctuations. This would be 
similar to the molecular spider of Stojanovic et al. [27]; Generation 1: Artificial 
cell membranes encapsulating various molecular devices. When several kinds of 
molecules are enclosed in a container, the concept of “concentration” arises, and 
its behavior can be predicted to some extent by chemical reaction kinetics. This is 
called an amoeba-type molecular robot; Generation 2: Various molecular devices are 
dispersed in a gel medium, whereby each molecule has a spatial distribution, resulting 
in a system with information on both “concentration” and “position.” This is called 
a slime-type molecular robot; Generation 3: The molecular robot of Generation 1 is 
multicellularized. Multicellular robots can be realized with more complex and diverse 
functions; and Generation 4: Hybridization, in which the molecular robot is expected 
to be combined with conventional nanotechnology such as photolithography. 

In the molecular robotics project, the goal was to achieve the first and second 
generations of this scenario. For Generation 1, the development of an amoeba-like 
molecular robot, we succeeded in constructing a system combining a light sensor, 
a DNA amplification circuit, a DNA molecular clutch, and a microtubule-kinesin 
molecular motor in a single artificial cell (liposome). We demonstrated that the 
liposome deforms like an amoeba and turns on and off using light stimuli [28]. 
For Generation 2, the slime-type molecular robot, a combination of BZ gel actuator 
(a self-oscillating gel actuator using the Belousov-Zhabotinsky reaction) and DNA 
computing was initially planned, but the operating condition of the BZ gel actuator 
required strong acidity, which is incompatible with the operating conditions for DNA, 
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and the project was abandoned. Instead, the molecular implementation of a cellular 
automaton system in gel space, called a gellular automaton, was explored [29, 30]. 

In addition to amoeba-type molecular robots, various technologies have been 
developed. For example, RNA nanostructure devices have been developed as thera- 
peutics. They function under the strong noise of various biomolecules and success- 
fully control the fate of cancer cells [31]. Another example is a technique to speed 
up various DNA computational reactions by combining synthetic DNA and artifi- 
cial nucleic acids [32]. Furthermore, microtubule assemblies controlled by light- 
responsive DNA [33] and gel actuators driven by DNA-controlled gel-sol phase 
transition [34] have been developed as actuators for molecular robots. 

The results of the “molecular robotics” project were summarized in a textbook, 
“Introduction to Molecular Robotics” [35], which was published in Japanese. The 
English version of the textbook is currently being edited. In 2014, DNA20 was held in 
Kyoto (Kyoto University), hosted by the molecular robotics project. The conference 
included a keynote lecture by Professor Seeman on “Molecular machines made from 
DNA.” 

In the year following the completion of the molecular robotics project, two 
new KAKENHI projects, as Innovative Areas, were launched, namely “Molecular 
Engine” (2018-2023), led by Kazu Kinbara and “Creation of Soft Robotics” (2018— 
2023), led by Koichi Suzumori, both of whom were from the Tokyo Institute of Tech- 
nology. These were the research areas of active matter and soft robotics, respectively. 
Some of the members of Molecular Robotics have participated in these projects, and 
collaboration between these areas is progressing. 


3.4 Current Research 


Four years after the molecular robotics project, a project entitled “Molecular Cyber- 
netics—Construction of a minimal artificial brain by the power of chemistry” was 
adopted as a KAKENHI Transformative Area (A) [4]. Transformative Areas (A) are 
similar to Innovative Areas. The Molecular Cybernetics is led by the author with 
18 core researchers including Taro Toyota (University of Tokyo) and Shin-ichiro 
Nomura, Akinori Kuzuya, and Takashi Nakakuki (Kyushu Institute of Technology) 
in control engineering. 

The amoeba-type molecular robot can have a variety of molecular devices imple- 
mented in a single artificial cell (liposome). However, it is difficult to derive appro- 
priate solution conditions under which all the different molecular species, such as 
light-responsive artificial nucleic acids, DNA devices, and molecular motors, can 
function. To solve this problem, we use multiple liposomes that contain solutions 
suitable for each of these molecular devices and bind them together to build a system. 
The liposomes are connected by a “transducer” for communication, which transmits 
molecular information without mixing the solutions. Our goal was to realize a simple 
learning function on a three-liposome system called a minimal artificial brain (Fig. 3). 
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In molecular cybernetics, there are several new approaches that have yet to be 
attempted. One of these is a variety of free services for members of the group. 
Kazunori Matsuura of Tottori University (peptide engineering) runs a peptide 
synthesis center that provides various peptide chains upon request from the members 
of the project. Keiji Murayama of Nagoya University (nucleic acid chemistry) runs 
the nucleic acid synthesis center, which provides nucleic acid sequences containing 
special artificial bases. In addition, Kuzuya of Kansai University will provide a range 
of single-molecule observation services using various microscopes and atomic force 
microscopes. Another new activity of the project is journalist-in-residence. Miki- 
hito Tanaka of Waseda University (social analysis) hosts external non-specialists, 
such as newspaper journalists, science writers, and science fiction authors. They will 
help society to accept the advanced concept of molecular cybernetics by providing 
an objective view of the ongoing project and disseminating information as links 
between researchers and the general public. In 2023, DNA29 will be held in Sendai 
(Tohoku University), hosted by the molecular cybernetics project. 


4 Summary 


This essay describes the history of the development of the research community in 
Japan, focusing on the evolution of research projects conducted to date. In addition, an 
example of how researchers can become involved in an emerging field such as DNA 
nanotechnology is given through the author’s personal research history. Currently, 
the COVID-19 pandemic continues to be unpredictable (January 2022). In Japan, 
the spread of the sixth wave is now feared, and almost all communication between 
researchers is restricted to online contact only. Once the pandemic ends, we will 
again be able to freely hold conferences and more readily collaborate to bring this 
research into the future. 

Professor Seeman passed away during the editing of this manuscript. He first 
revealed the possibility of molecular robotics based on DNA nanotechnology, and 
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many researchers in Japan have been directly or indirectly influenced by him. I would 
like to express my gratitude for his guidance and provide my sincere condolences on 
his loss. 
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Thomas H. LaBean 


Abstract Research toward the use of nucleic acids as a medium in which to encode 
non-biological information or as a structural material with which to build novel 
constructs has now been going on for 40 years. I have been participating in this field 
for approximately 24 years. I will use my space within this dedicated volume to 
relate some of my personal experiences and observations throughout my own DNA 
nanotech journey. 


1 Discovering DNA Computing 


DNA-based nanotechnology, for me, began in 1994 when Len Adleman’s paper 
“Molecular Computation of Solutions to Combinatorial Problems” came out [1]. I 
was at a meeting out west, and people were talking about this new Science paper 
at lunches and dinners. It was the buzz subject of social gatherings for the week. 
Adleman is the “A” in the RSA encryption standard, and here, he was dabbling 
in actual biochemistry in order to prototype the encoding and processing of non- 
biological information in DNA molecules for what seemed to be the first time ever. 
Adleman used a “generate and sort” strategy to solve an NP-complete problem (i.e., 
the Hamiltonian Path Problem) using a library of DNA molecules as the scratchpad 
upon which a library of possible solutions was written. Contrary to many popular 
press reports at the time, Adleman did not solve the Traveling Salesman Problem; 
he answered the question “is there a Hamiltonian path” through this specific 7- 
node, directed graph, not “what is the shortest Hamiltonian Path?” His system was 
designed such that biochemical laboratory steps could be used to sort through the 
molecules and reject those with incorrect answers recorded upon them (i.e., invalid 
or non-Hamiltonian paths through the graph). Through Adleman’s paper, molecular 
computing or DNA-based computing was introduced to me and to a broad scientific 
audience. Of course, by then, Ned Seeman had already conceived of and been toiling 
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within the field of DNA nanotechnology for over a decade; it was just that he toiled 
mostly alone and outside the eye of a large audience, and certainly outside of my 
notice, up until then. DNA nanotech entered into my attention and seemingly into the 
wider public attention through the door of computer science (CS), through molecular 
computing. 

I didn’t join the DNA nanotech community until 1998. At the time, I was a 
postdoc in Jane and Dave Richardson’s lab working on de novo protein engineering. 
We were testing our knowledge of the rules of protein folding by trying to design 
well-folding proteins from scratch. We were deeply involved with the Diversity 
Biotechnology Consortium (DBC) including Jane and Dave, plus Stu Kauffman 
(my Ph.D. advisor and pioneer of “self-organization in complex systems”), Mario 
Geysen (the father of “mimetopes”), Frances Arnold (world-famous protein engi- 
neer/evolver who would later be awarded a Nobel Prize), Andy Ellington (who actu- 
ally coined the term “aptamer”), Pim Stemmer (who invented “sexual PCR”), and 
many other prominent folks with whom I had the good fortune to interact. The DBC 
was busy furthering the quite new field of combinatorial chemistry within the biolog- 
ical realm by designing clever selection techniques and applying them to libraries 
of random-sequence biopolymers (i.e., polypeptides and polynucleotides) in order 
to winnow the vast populations of possible polymers within sequence space and 
find/evolve functional, individual sequences for specific purposes. This focus was a 
direct, logical follow-on from my Ph.D. project in which I randomly searched amino 
acid sequence space for polypeptide strands that were able to fold into compact, 
globular 3D structures as do many evolved, biological proteins [2-5]. 


2 Connections to Broader Scientific Themes 


The more general context of my work at the time was that combinatorial chemistry 
was sweeping the pharmaceutical industry and revolutionizing drug discovery. Then, 
1990 was a very big year, in which new tools emerged that allowed combinatorics 
and in vitro evolution to successfully break through into biochemistry and struc- 
tural biology. The various biopolymer sequence spaces could now be explored and 
mapped. Specifically, Ellington and Szostak [6] and Tuerk and Gold [7] described 
the formation of randomized nucleic acid libraries and in vitro selection of RNA 
molecules with specific sequences that provided self-folding and affinity binding 
to a variety of chemical targets, while Scott and Smith [8] did the same thing for 
polypeptides via the implementation of phage-display libraries. In the late 80s, we 
were pursuing similar goals in Stu Kauffman’s lab, but our early genetic strate- 
gies turned out to be too recombinogenic, and thus stable subclones could not be 
effectively isolated and propagated. 

Ona personal front, I was divorced in 1996 and had a clause added to the separation 
agreement that my sons remain in Durham for the indefinite future, so that my ex- 
wife could not try to move my sons away from me. While this worked fine for my 
family life, it put a severe crimp in my professional plans, since I was unable to 
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perform a national search and pursue a tenure-track, Assistant Professor position 
with any geographic freedom. Moving into DNA nanotech enabled instead a slow 
but steady climb up through the Research Professor track at Duke. I was fortunate to 
finally land a tenure-track faculty job as Associate Professor at North Carolina State 
University in 2011, be awarded tenure in 2012 and be promoted to full professor in 
2017. Professionally (as well as scientifically), DNA nanotech has been a great field 
to work in, and it has allowed me to follow an unconventional career path in which 
I have never been an Assistant Professor. 

Back in 1998, the invitation to abandon de novo and library-based protein design 
and start designing novel nucleic acid structures came when John Reif walked across 
Research Drive in the middle of the Duke campus looking for some biochemist to 
teach him about DNA and molecular biology. Another postdoc in Jane’s and Dave’s 
group said there was a crazy, theoretical computer scientist (redundant (?)) looking 
around for someone to talk to. I taught John a lot of very basic stuff like what ‘ligase’ 
was, but he was a quick learner and was offering a decent salary to me if I were to 
dump Biochemistry and join the forces of Computer Science. I ended up making a 
deal with the Richardson’s, and we maintained our wet lab operations in the Duke 
Biochemistry Department for a number of years until our DNA team grew too large 
and started crowding Jane and Dave in their own space. 

Shifting gears from protein engineering to nucleic acid engineering was not a 
difficult move. In fact, a researcher must be comfortable with the manipulation of 
DNA, through synthetic chemistry, molecular biology, and microbiology procedures 
in order to hope to do experimental protein design. Moving to DNA structural engi- 
neering actually increased the speed of work by eliminating many gene expression 
and protein purification headaches and other tedious steps in the macromolecular 
design cycle. 


3 Ned Seeman: Founder of the Field 


The official start of DNA nanotech (as is reflected in the title of this volume) was 
actually in 1982. Ned Seeman’s paper in the Journal of Theoretical Biology [9] 
challenged people to think about DNA as a structural material instead of as a genetic 
material. The concept was to assemble periodic matter from DNA for various uses 
such as guest—host systems for docking protein molecules to reliably form 3D crystals 
for use in X-ray diffraction studies to solve the atomic-scale structures of the guest 
proteins. The way Ned frequently explained it in later years was that he was trying to 
succeed as a crystallographer but was failing to produce high quality protein crystals; 
therefore, no crystals meant no crystallographer, so he had to try something novel. 
Probably most people, like me, did not know anything about the 1982 JTB paper when 
it first came out. I was an undergraduate at the time the paper was published, and it 
would not come to my attention until about sixteen years later. My first introduction 
to Ned’s work was learning about his closed, geometric-like, or topological objects 
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such as the DNA cube and truncated octahedron, the family of double-crossover tiles, 
and finally the 2D DNA crystals imaged on mica by atomic force microscopy [10]. 

In 1998, John Reif organized a team that brought major DARPA funding to the 
nascent field of DNA nanotech. I signed on at that point. At my first DNA Computing 
conference, DNA4 in Philadelphia, I got to visit my old stomping grounds at UPENN 
and to meet many of the characters who were busy developing the discipline of DNA 
nanotech. Seeman’s lab at NYU was, of course, a hot spot that was producing the 
first rounds of young experimental pioneers in the field including Junghuei Chen, 
Chengde Mao, and Hao Yan. In those early days, people typically were coming 
into DNA nanotech through either chemistry/biochemistry/molecular biology or 
through CS and were joining the community to learn more about the opposite wing 
of the field. People who came in through the CS door included Grzegorz Rozen- 
berg, Len Adleman, Anne Condon, Richard Lipton, Natasha Jonaska, and Lila Kari. 
They worked on widely varying topics including word/code design, encoding strate- 
gies, simulation or theoretical models of DNA computing strategies for addressing 
demanding CS problems (NP-hard or NP-complete problems), or simpler problems 
like molecular implementations of Boolean logic, databasing, cryptography, etc. It 
was also exciting to meet people like physicists Andrew Turberfield and Bernie Yurke 
who were fluent in both experimental and theoretical languages. These thinkers 
expanded the use of DNA in several directions including not only as a structural 
material but also to function as fuel for these new molecular machines [11]. 

At the DNA4 meeting in 1998, I first met Erik Winfree, mature for his age and 
obviously brilliant, he was a graduate student who understood all the scientific back- 
ground implicitly, acted socially with the grace and ease of a senior academic, and 
was responsible for establishing the idea of algorithmic assembly at the center of the 
burgeoning field of DNA nanotech. Erik would soon continue “a family tradition” 
and receive a MacArthur Prize as did his father, Art Winfree (sidenote: Art was a 
friend of my Ph.D. advisor Stu Kauffman, so I knew his name and his work long 
before meeting his son, Erik. Also, I have had the privilege of working directly under 
two MacArthur fellows, Kauffman, as well as Jane Richardson). The application of 
algorithmic assembly to DNA computing was a revolutionary concept. Prior to that, 
the “generate and sort” strategy predominated, in which all or many possible solu- 
tions were created within a molecular library, and then, the set was sorted biochemi- 
cally by discarding incorrect or suboptimal solutions to the problem, similar to what 
Adleman did in 1994. On the other hand, algorithmic assembly was a molecular 
implementation of Wang tiling, a theoretical computing model in which colored tile 
edges specify allowed assembly rules and a small tile set is capable of generating 
very large, complex, programmed patterns. Algorithmic assembly allowed imple- 
mentations in which only correct answers would be assembled in the first place. This 
opened the field to a whole world of new possibilities and pushed creative thinking 
in diverse directions. 

Directly following DNA4, John Reif and I rented a car and drove Ned back to New 
York where we visited his lab for a couple of days. I enjoyed the distinct privilege of 
going up to Ned’s apartment in NYU faculty housing and seeing exactly what you 
would expect from a dedicated bachelor/scientist in Manhattan: a room lined with 
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overflowing bookshelves and a single reading chair in the middle of the room. I was 
amused to note that the well-worn reading chair had leaked a small pile of crumbled 
polyurethane foam that rested at the angle of repose right where it had fallen out of 
the back of the aging chair. It just looked like he didn’t waste time with unimportant 
distractions like decorating or cleaning. My other fun memory of Ned came a couple 
years later when he was talking at a meeting somewhere in Europe (I think), and he 
whipped his belt off from around his waist to use it as a visual aid for demonstrating 
DNA supercoiling. The funny thing was his joke announcement, something like: 
“Sorry to disappoint you, folks, but that’s as far as it goes.” There have been so many 
personal and professional memories surrounding lots of characters in the field of 
DNA nanotech, and I have very much enjoyed being part of this community. 

At DNA4, I was just joining the DNA nanotech club, observing and learning, 
but by DNAS in 1999, I was deeply embedded in the field and co-authored three 
papers at that conference. One of my contributions included the first use of the term 
“scaffold strand” to indicate a long strand around which other oligos would assemble 
and generate a structure larger than a standard “tile” [12]. I took Seeman’s concept 
of a “reporter strand,” a strand that is ligated together only within the context of the 
desired, assembled structure, and inverted it by preassembling the long “scaffold” 
so that it could act as a nucleation element for a larger superstructure. The concept 
of scaffold strands was further developed in [13] where a “fully addressable” 2D 
structure was illustrated and proposed. We also proposed another structural variant 
of a scaffolded assembly that same year [14]. Years later, Paul Rothemund mentioned 
to me that these early uses of scaffold strands and schematic proposals for scaffolded 
structures were used against him as pre-existing technology when the US patent office 
first turned down his patent application describing DNA origami, a big mistake, in 
my opinion, on the part of the patent examiner. 


4 Personal Milestones 


Among the milestones of DNA nanotech that I am proudest to have been a part 
of, I would include our 2000 demonstration of cumulative XOR computation using 
assembly of TX tiles [15]. This was the first published experimental demonstration of 
a molecular computation by algorithmic self-assembly using DNA tiles. Winfree and 
others would create more impressive control and computational scale soon enough 
[16], but it was fun to be on the cutting edge however briefly. This community has 
been highly collaborative and cooperative even while also being competitive. 
While we were still using labs in the Biochemistry Department, I managed to 
lure “Ned’s best student,’ Hao Yan to join us at Duke in 2001, and thus ensued a 
number of highly productive years, often centering around the “golden hands” of my 
first graduate student, Sung Ha Park. At that time, Sung Ha could get any experi- 
ment to work; he calmly solved many complex, sticky experimental problems that 
other people banged their heads against unsuccessfully. Hao was known around the 
group as “the finisher” because he could see, plan, and execute the final experiments 
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necessary to finish a story and complete what was needed for the next publication. 
Hao then moved on in 2004 to set up his own research group at Arizona State, and 
somehow that felt like the “early years” of the field were starting to come to a close. 

Another major event of 2004 was the publication of William Shih’s DNA octahe- 
dron [17]. This was the first time I had ever heard William’s name, but not surpris- 
ingly, thereafter, I would never forget it. A few extremely impressive things about 
the groundbreaking accomplishments of that report include: self-folding (via thermal 
annealing) of a long molecule with the assistance of a few “helper strands,” production 
of the nanostructure by DNA polymerase (rather than solid-phase chemical synthesis 
of short oligonucleotides), and surprisingly detailed cryo-EM structure evaluation. 
These attributes went against the grain of many long-held tenets and traditions of the 
field and presaged several revolutions to come. 

In 2006, I was again proud to be at the cutting edge of the field by helping 
to create what was at the time, the “largest” (since we were trying to construct 
nanostructures of increasing size), fully addressable 2D array; these were 80 x 
80 nm grids with 16 pixels that were “turned off and on” by programmable binding 
of avidin protein molecules [18]. We compared a couple of hierarchical assembly 
strategies and employed the tile-lattice method. Our accomplishment and “world 
record” held for only a couple of months and was soon shattered by the advent of 
DNA origami. 

Paul Rothemund’s description of DNA origami in 2006 changed all the rules of 
DNA nanotechnology [19]. First of all, a single-author, cover-of-Nature paper was 
relatively anomalous but not too surprising once you got to know Paul and his body 
of work which was and is uniformly groundbreaking, intellectually deep, and widely 
divergent in topics. The design constraints that the field had been laboring under 
including: (1) Religious re-use of a small number of specific nucleotide sequences at 
Holliday-junction-like strand exchange points for crossovers (previously everyone 
only used variants of the J1 junction sequence first worked out by Seeman). Paul 
showed that essentially any sequence would work. (2) Exact strand stoichiometry 
matching during assembly reactions was no longer an issue; excess staple strands 
effectively folded “all” of the available scaffold. (3) Impure oligonucleotides (i.e., 
standard desalt from IDT instead of laborious, in-house, PAGE purification of each 
individual oligo) did not affect assembly yield. These changes may not seem like 
much from today’s perspective, but at the time, they arrived as a major, earth- 
shattering revolution. It also felt like almost everybody shifted gears and got into 
the origami game, including quite a few people who came into DNA nanotech on the 
wave of relative ease with which origami structures could be adapted to different uses, 
modified, and even designed anew from scratch. However, completely redesigning an 
origami cost several thousand dollars back then, so the basic cost of a new architec- 
tural design increased significantly versus the old repetitive, tile-and-lattice strategy. 
There also came an immediate echo from China that heralded the presence of another 
brilliant, self-motivated student when Lulu Qian rapidly and independently designed 
and assembled an origami in the shape of China [20]. Extension of DNA origami 
into all types of massive and fantastic 3D shapes has come from William Shih’s 
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group, especially during the Douglas, Dietz, Liedl, Högberg years, and continued 
subsequently in their individual labs and those of many others. 


5 The End of the Early years 


In attempting to restrict this missive to a reasonable length and also to the “early 
years” of DNA nanotech as stated in the title, I will quickly point to a few more 
important developments and then conclude. Another key signpost for the field came 
when single-strand tile or the SST strategy was implemented and first published in 
2008 by Peng Yin [21]. Peng had been gone from the Duke group for a while by 
then, but this work resulted from his time in Durham. We had been working on a 
project that Sung Ha Park and I referred to as “tile-less lattice” where individual 
oligos would assemble upon the growing lattice without first forming tile-like units. 
The most successful design using this concept was finalized by Peng while he was 
at Caltech and became the programmed tube circumference strategy which subse- 
quently became the 2D [22] and finally 3D SST DNA bricks or DNA Lego architec- 
tures which are now a successful subfield of DNA nanotech. Major efforts are now 
under way to build ever larger self-assembled DNA materials including the incred- 
ible 2D multi-origami, hierarchical structures of Lulu Qian’s group and wireframe 
tessellated 3D structures from several groups. Progress has also been breathtaking 
on functional DNA hydrogels for soft robotics by Rebecca Schulman and others. 
Single-stranded RNA origami has been a thing since 2014 [23], and I have recently 
had the good luck to co-founded a startup called Helixomer, to commercialize an 
RNA origami-based anticoagulant (with reversal agent) for human health care. There 
are a number of startup companies now bringing DNA nanotechnology in various 
forms to the marketplace. 

Following its founding by Ned Seeman in 1982, the field of DNA nanotechnology 
has grown up through the efforts of a large and expanding community of researchers. 
Ned also founded the International Society for Nanoscale Science, Computation, 
and Engineering (ISNSCE) (in which I am currently serving as President). The 
society has sponsored two conferences per year for a long time: DNA-X focuses on 
computational aspects of DNA nanotech while FNANO draws in a larger community 
of people studying self-assembly in nucleic acids but also in other molecular systems. 
I believe that there have already been 26 annual DNA-X meetings (of which, I have 
been to 11) and 18 FNANO meetings (of which I have been to 17). FNANO has 
always been held at the Snowbird resort in Utah, partially due to John Reif’s lifelong 
love of downhill skiing. DNA-X has been held alternately in North America, Europe, 
North America, Asia (repeat) for quite some time (until the pandemic forced it online 
in 2020). Traveling to the DNA-X conferences, as well as to work in the labs of 
collaborators in Denmark and China, has provided me with fantastic opportunities to 
explore the world. Memories of hanging out with the Italians in Japan, for example, 
still bring a smile to my face. These travels have led to lifelong friendships. 
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During this journey through the world with DNA nanotech, I have managed 
to amuse myself at various times by embedding personal symbols and souvenirs 
or unnecessary cultural references within the scientific literature. Some examples 
include a photograph of William S. Burroughs that I put into a cryptography paper, 
the phrase “sub-Lilliputian holiday” I tucked into a book review, and a molecularized 
likeness of the Venus de Milo placed in a news-and-views piece. I think people have 
to play small games like that just to keep things interesting and as minor acts of 
mischief even if the only one in on the joke is yourself. If someone else had written 
this type of piece, it would almost certainly highlight a different subset of people and 
events, but this is the way I remember the unfolding of DNA nanotech. I apologize 
if anyone is offended or hurt by anything I have written or failed to write in this 
short memoir-style paper. My intention is simply to record some of my memories 
for posterity, and I am not trying to cause trouble, attack/insult anyone, or grind 
any particular axes. When I received the invitation to contribute to this volume, the 
editors described a colorful array of possible essay types that they were imagining 
and hoping to include. This sort of personal history and reminiscence was the option 
that caught my interest. I am hoping to be able to read similar musings from other 
longtime members of the DNA nanotechnology community because, of course, my 
idiosyncratic points of view about events and developments through time are neces- 
sarily limited. Still, I hope this slim contribution adds to the overall historic document 
that the editors have bought into the world. 

This volume is now imbued with added poignancy since Ned Seeman passed away 
during the month before the due date of this manuscript. His creativity, irreverence, 
and intelligence will be sorely missed in our community. 
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Abstract From a programming perspective, DNA is stunningly simple: a string of 
bits coding two types of interactions. The specific chemical form of DNA given to us 
by evolution imposes significant constraints on what is possible with DNA nanotech- 
nology. In this paper, I propose three designs for new digital DNA-like polymers that 
retain the essential information-bearing properties of DNA while enabling functions 
not achievable with DNA such as greater stability, programmability, and precision. 


A scientist working in the field of DNA nanotechnology can be very successful 
without knowing how to draw the chemical structure of DNA. Indeed, from a 
programming perspective, DNA is stunningly simple: a string of bits coding two 
types of interactions, i.e., a digital polymer (Fig. 1, left). But from a chemistry 
perspective, DNA is exceedingly complex: heterocycles, sugars, and phosphates are 
organized via a diverse arsenal of chemical bonds—covalent, hydrogen, ionic, and 
mm stacking—to give just the right structure enabling robust recognition by many 
complex biomolecules for reading, copying, evolution, and a multitude of other 
biological functions (Fig. 1, right). 

This specific form of DNA given to us by evolution imposes significant constraints 
on what is possible with DNA nanotechnology. For example, G-quadruplex formation 
limits sequences available for strand displacement, and cleavage by nucleases limits 
in vivo utility of DNA-only structures. Nevertheless, DNA remains the most powerful 
molecule for molecular programming due to the simplicity of its programming rules 
and its practical advantages including: (i) fast and cheap custom synthesis (e.g., by 
integrated DNA technologies and Twist Bio), (11) mature design and modeling tools 
(e.g., scadnano [1], Peppercorn [2], MagicDNA [3], NUPACK [4], and OxDNA [5]), 
(iii) ability to amplify copy number (e.g., by PCR), and (iv) fast and cheap sequencing 
(e.g., by Pacific Bio and Oxford Nanopore). In this paper, I will describe three new 
digital polymers that can enable functions not achievable with DNA while retaining 
the essential information-bearing properties and some practical advantages of DNA. 
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Fig. 1 DNA from the perspective of a molecular programmer (left) and a chemist (right) 


An astute reader may be wondering, “this is a very ambitious proposal. People 
have worked on DNA alternatives for many years. What about all the existing poly- 
mers such as PNA, GNA, and other XNAs?” Yes, many people have worked on 
DNA alternatives for decades exploring questions ranging from pure curiosity about 
molecular origins of life [6] to very practical goals of improving microarray sensors 
[7]. Typically, these alternatives have been focused on creating molecules possessing 
a specific attribute of DNA, most notably (a) transcription and translation by enzymes 
[8], (b) ability to recognize DNA sequences while being more stable [9], and (c) infor- 
mation storage [10]. These alternatives include GNA (a DNA analog in which sugar 
is replaced with a more stable glycerol [11]), phosphorothioate DNA (a DNA analog 
in which oxygen in phosphate is replaced with sulfur [12]), PNA (a DNA analog in 
which sugar is replaced with a neutral peptide-like backbone [13]), and L-DNA (a 
DNA analog with opposite chirality to DNA [14]). It may seem surprising, but no 
one, to my knowledge, has tried to redesign DNA to enlarge its functionality for the 
purposes of DNA computing or DNA nanotechnology. As the new digital polymer 
designs below will demonstrate, without the constraint to bind natural DNA or be 
processed by natural enzymes, we have much more freedom in molecular design. 

Before describing the new polymer designs, it is instructive to first consider what 
desirable functions of a digital polymer DNA currently lacks, and how these might be 
achievable by existing approaches without building an entirely new polymer. Signifi- 
cant limitations of DNA, summarized in Table 1, include low stability, limited chem- 
ical functionality with 4-word alphabet, and lack of bio-orthogonality. Generally, to 
overcome these limitations, chemical modifications or non-DNA coating are used. 
However, these modifications are frequently hard to implement for non-chemists and 
often disrupt the desired structure of DNA. 

While combining all the above functions in one polymer is neither feasible nor 
necessary, I propose three new polymers below that draw from this pool of func- 
tions to create DNA analogs that better serve specific tasks. For each polymer, I 
will specifically discuss redesign of the recognition elements and backbone, the new 
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Table 1 Desired properties/abilities and approach to achieve them with DNA-based structures 


Property 


Improved stability 
in vivo 


New abilities 
(examples) 


Wider biomedical 
applications 


How to do it with 
DNA (examples) 


Ligation, increasing 
cross-overs, chemical 
modification, 
coatings [15] 


Limitations of doing 
it with DNA 
(examples) 


Coatings reduce 
surface access, 
chemical 
modifications can be 
costly, and disrupt 
DNA structure 


Improved thermal 
stability 


Wider applications in 
hot environments 


T-T welding [16], 
ligation [17] 


Can distort the 
structure 


Solubility in non-polar 
solvents, such as hexane 


Compatibility with 
industrial processes 
such as spin casting 


Attaching alkyl 
chains [18] 


Limited to making a 
small number of 
modifications 


More chemical variety 


More compact and 
stronger aptamers 


Biological and 
chemical synthesis 


[19] 


Modifications can be 
costly and disrupt 
dsDNA structure 


More than four letters 
(ACGT) 


Increased information 
density 


Biological and 
chemical synthesis 


[8] 


Modifications can be 
costly 


Higher density 
breadboard 


Better control over 
electronic and 
biological coupling 


Use square lattice 
origami (~2.5 nm 
spacing) [20] 


Need extremely 
closely-packed DNA 


Response to stimuli 
(pH, light, RedOx) 


External control of 
binding 


Hoogsteen pairing 
designs, chemical 


Modifications can be 
costly and disrupt 


modifications [21] dsDNA structure 
Wider range of Stronger than G—C Currently cannot be | N/A 
hybridization energies | pairs can reduce leak | done 
by minimizing 
fraying/breathing 
Controlled helical pitch | Low pitch — less Currently cannot be | N/A 


knots 


done 


Fluorescent base 


Biophysics studies, 


Chemical synthesis 


Modifications can be 


options in vivo tracking [22] costly and disrupt 
dsDNA structure 
Bio- orthogonality Minimize interference | L-DNA [23] Modifications can be 
with physiology costly and disrupt 
dsDNA structure 
Conductivity or Nanoelectronics Regular DNA is a Poor performance and 


semi-conductivity 


poor conductor [24], 
but it can be coated 
with metals [25] or 
semiconducting 
polymers [26] 


uniformity of coated 
nanostructures 


(continued) 
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Table 1 (continued) 


Property New abilities How to do it with Limitations of doing 
(examples) DNA (examples) it with DNA 
(examples) 
Flexibility control Nanomechanical Adding defects or Limited control for 
devices flexible linkers in ssDNA 
multi-strand 
structures [27] 


functions this enables, and synthesis. The first polymer, NP1, marginally redesigns 
DNA bases while introducing a simpler and more modular a-peptide backbone. This 
should expand its chemical vocabulary and programmability, among other advan- 
tages. A second, more ambitious polymer design, NP2, proposes a recognition code 
based entirely on natural peptides. This should also increase chemical functionality 
while enabling scalable production in vivo. A third polymer, NP3, aims to make a 
covalent recognition code for applications requiring very stable architectures. This 
should yield polymers with ultimate stability, e.g., for making portable devices. I 
hope that the ideas explored in these three designs can lead to an extended toolkit 
for molecular programming that will enable nanotechnologies currently impractical 
with DNA. 


1 New Polymer 1 (NP1) 


The design of NP1 builds on the successes of DNA, proteins, and peptide nucleic 
acid (PNA) while adding new functions unattainable by any single class of these 
molecules such as higher stability and larger design space. First, I propose to “fix” 
DNA bases for a slimmer and stronger code. Then, I explain why an a-peptide 
backbone is better than any other backbone for a digital polymer, especially when it 
comes to scalable production of the new polymer. Finally, I provide an overview of 
related existing approaches and discuss ways to overcome potential pitfalls. 
Recognition elements. The recognition between two strands of NP1 is based on 
unmodified DNA bases (T and C) and modified DNA bases (A and G). First, to enable 
denser molecular pegboards, I propose to make the double helix slimmer by replacing 
purine bases with pyrimidine, effectively chopping off the 5-membered ring (Fig. 2, 
blue fragments; compare to DNA). One concern is that this “slimming surgery” 
may disrupt the balance between hydrophobic and hydrophilic interactions that has 
been finely tuned during molecular evolution, thus destabilizing the helix. Another 
concern is that this design goes against the size complementarity hypothesis, which 
states that large purines must pair with small pyrimidines. In fact, however, “skinny” 
DNA where purines are replaced with pyrimidines preserves its structure and even 
gains stability [28]. Another advantage of removing the 5-membered heterocycle is 
that this eliminates guanine’s ability to form quadruplexes, a big nuisance for some 
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DNA applications such as information retrieval in G-rich DNA sequences for data 
storage. 

To improve the stability of A-T pairs, I propose to add an amino group to the 
modified adenine (Fig. 2, red). This converts an asymmetric double hydrogen bond to 
a symmetric triple hydrogen bond, which has been shown to increase duplex stability 
while precluding stray recognition by some biomolecules normally binding to the 
minor groove [8]. To preserve hydrogen-bonding motifs, I propose to move the nitro- 
gens in the new pyrimidines away from the ring location linked to the backbone. The 
reason I retain the nitrogens is that their electron withdrawing effect is important to 
maintain the same energy of lowest unoccupied molecular orbital (LUMO) local- 
ized on the lone pair of the hydrogen-bonding amino group in position 2 of the new 
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Fig. 2 Structures of DNA, PNA, new polymer 1 (NP1), and NP1 with peptoid ACGT analogs 
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pyrimidine analog of adenine. One can also achieve a similar electron withdrawing 
effect by introducing a nitro group in position 6 of the ring [28]. Another reason 
to move nitrogen to the new position 3 is that it is easier to synthesize such a base 
compared to one with carbon. 

Backbone. Unlike PNA, NP1 organizes hydrogen-bonding elements on a simpler 
yet more versatile natural -amino acid backbone. This can potentially allow adap- 
tation of not only existing chemical but also biological peptide syntheses to produce 
NP1. Adding a canonical amino acid after every recognition bearing non-canonical 
amino acid (Fig. 2, colorful Rs) also allows a great expansion of the chemical func- 
tionality available to NP1 compared to DNA or PNA. A larger chemical space 
afforded by more diverse and numerous amino acid residues compared to nucleotides 
is likely one of the main reasons why Nature has eventually chosen peptides and not 
polynucleotides to build most molecular machines. 

To keep the initial designs as close to DNA as possible, I propose to place 
hydrogen-bonding bases on every second amino acid, which should give a base 
spacing very close to the ones in DNA and PNA (PNA forms a double helix with 
18 bases per period versus 10.5 for DNA [29].) Unlike PNA, NPI is designed to 
be chiral by adding chiral (L-) ACGT amino acids and natural L-amino acids. As 
a reminder, PNA lacks chirality due to the rapid inversion of the tertiary prochiral 
nitrogens. This inversion leads to racemic mixtures of left- and right-handed helices, 
as well as promiscuity in C—>N polarity of the peptide backbone, an equivalent of 
5'—>3' polarity of the sugar backbone that enforces the antiparallel requirement for 
DNA helices. While it is possible to add chirality to the PNA backbone by synthe- 
sizing y-modified PNA, NP1 is more modular compared to such a y-modified PNA, 
because NP1 requires only 24 monomers (four ACGT analogs and 20 a-amino acids) 
to enable more design freedom than possible with 80 y-modified PNA monomers (4 
x 20). Additionally, while PNA’s neutrality makes it challenging to deliver in vivo, 
NP?’s hydrophilicity may be tuned by incorporating a variety of hydrophilic and 
hydrophobic amino acid residues enabling applications in media with a range of 
lipophilicities from water to hexane. However, care should be taken to avoid combi- 
nations which are unsuitable, including large regions of highly polar side chains or 
unbroken polar-non-polar alternating regions to avoid amphiphilic or sheet-forming 
behavior. 

Looking at DNA through the eyes of a chemist also presents a tantalizing oppor- 
tunity to explore the design space of two-stranded helical molecules. For example, in 
NPI, nucleobase analogs can be integrated more sparsely (with more natural amino 
acids inserted between the ACGT analogs compared to just a single amino acid in 
Fig. 2), allowing natural biologically-active peptide fragments to be organized with 
a precision and complexity infeasible using current protein engineering approaches 
[30]. Also, by varying the number and the type (a, B, y, 8) of amino acids, it should 
be possible to control the helical pitch, i.e., the angle between two adjacent stacks, 
which is 34.3° for DNA and 20° for PNA. In an extreme case, it may be possible to 
reduce the rotation per base pair to 0° by directly linking two nucleobase analogs 
(skipping a canonical amino acid). This is because the distance between two adjacent 
nucleobase analogs attached to a fully stretched a-amino acid peptide backbone is 
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3.5 A, which means that the scaffold almost does not need to twist for the bases to 
reach the typical stacking distance of 3.4 A. 

Synthesis. To realize its practical potential, a new digital polymer should be 
amenable to automated synthesis, allowing efficient production of custom sequences 
as can be done now for DNA, peptides, and oligosaccharides [31]. Initial efforts 
may most profitably be dedicated to the synthesis of the four amino acids nucle- 
obase analogs that are compatible with the standard automated peptide or peptoid 
syntheses [32]. Next, the effect of sequence and other design elements on the struc- 
ture of the double-stranded helix and simple multi-strand constructs using nuclear 
magnetic resonance (NMR), X-ray crystallography, and atomic force microscopy 
(AFM) should be investigated. 

Once optimal structures are elucidated, efforts should be focused on developing 
larger-scale sustainable synthesis. One approach can be to adapt natural cellular 
protein synthesis machinery to produce large amounts of NP1; this will leverage the 
last two decades of rapid progress in techniques for co-translational incorporation 
of unnatural amino acids into proteins produced in cells via genetic code expansion 
[33]. Finally, directed evolution may be used to modify components of the translation 
machinery such as synthetase/tRNA pairs to selectively incorporate the four new 
amino acids with A, C, G, T analog residues first in cell-free media and then in cells 
[34]. 

Existing approaches to design a simple non-covalent recognition code. Much 
work has been done to create digital polymers with recognition between strands based 
on hydrogen bonds and other non-covalent interactions [35-38]. However, a DNA- 
like polymer with robust recognition of DNA has not yet been reported, partially 
because creating such a digital polymer for the purposes of DNA nanotechnology 
was not the main reason for these efforts. 

Potential pitfalls and ways to overcome them. Experts on protein structure may 
have a valid concern that instead of the desired double helix, NP1 may form a-helix, 
B-sheet, or other motifs common in proteins due to the presence of carbonyls and 
NH groups. I expect that the three locally organized hydrogen bonds in the base pairs 
will be more favorable than the two relatively remote hydrogen bonds of a-helices or 
B-sheets. Also, proline, an amino acid that lacks NH, can be sparsely added to break 
up a-helices or B-sheets. Yet another way to minimize spurious hydrogen bonding is 
to use peptoid versions of ACGT where the base is attached to the nitrogen (Fig. 2). 
As a note, PNA has the capacity to form spurious NHOC hydrogen bonds (twice as 
few as a-peptides but the same number as the peptoid version of NP1) but still prefers 
to form a double helix. However, I propose to keep regular a-amino acid monomers 
for non-ACGT monomers instead of fully switching to peptoids to retain the desired 
chirality and higher chance for in vivo synthesis. In planning in vivo applications, 
one should keep in mind that while gaining stability to nucleases NP1 will become 
susceptible to proteases. In addition, one should consider that NP1 may bind DNA 
especially in a single stranded form due to the similarity of its recognition elements 
to DNA bases. 
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2 New Polymer 2 (NP2) 


Design. NP1 requires developing synthesis and incorporation of synthetic DNA base 
analogs. Is it possible to avoid this and design a new DNA-like polymer entirely from 
natural amino acids? The design of NP2 also relies on the natural w-amino acid peptide 
backbone. However, instead of familiar ACGT analogs, this design seeks to construct 
recognition elements from natural amino acid side chains (Fig. 3). Hydrophobic inter- 
actions such as V—L or F—W or single hydrogen bonds in pairs such as S-T or C-Y 
are too weak and not specific to provide a reliable code. One excellent candidate is 
R-D pair that is maintained by two hydrogen bonds and electrostatic attraction. This 
interaction is likely even stronger than the G-C hydrogen bond. In principle, it is 
possible to construct a digital polymer using just a single heterophilic interaction 
(i.e., binding partners are different) such as G-C or R-D, but this would substantially 
increase possible spurious interactions and decrease information density for the same 
length compared to a polymer with two heterophilic interactions such as DNA. There- 
fore, it is desirable to introduce another specific interaction that is as orthogonal to 
R-D as possible. Since the only other available heterophilic interaction—K—E—also 
involves a carboxylic acid (E, on a slightly longer leash than D) and will likely bind 
to R, I suggest using a homophilic interaction (i.e., binding partners are the same) 
such as N-N or Q-Q or Q-N. This interaction is also maintained by two hydrogen 
bonds, but the partners are not charged as in the R-D pair. The lack of charges should 
bring HOMO and LUMO of binding partners closer, making this interaction more 
orthogonal to R-D. Among the three possible interactions (N—N or Q-Q or Q-N), 
I show N-N in Fig. 3, because according to preliminary molecular dynamics calcu- 
lations performed in my group, N—N gives more stable double-stranded complexes 
compared to Q-Q, Q-N, and N-Q. R, D, and G essentially comprise a three-letter 
code which will likely have more spurious interactions than a four-letter code (A, 
T, G, C). My group is currently studying a series of complexes such as shown in 
Fig. 3 (R = G and L) by NMR, AFM, and X-ray diffraction. P is inserted to disrupt 
potential a-helices or B-sheets as discussed for NP1. 

Existing approaches to design a simple recognition code with peptides. The 
idea that it is possible to construct a recognition code similar in simplicity to A-T 
and G-C of DNA with natural amino acids may seem preposterous to some. “If 
there had been such a simple peptide motif, Nature (not the journal) would surely 
have found it by now. If not Nature, then David Baker’—an active reader may think. 
And the reader would be partially right: Some recognition elements have indeed 
been identified both by stochastic natural evolution and more rational protein design. 
However, these motifs are still not as simple, programmable, and practical as A-T 
and G-C of DNA, as I explain next. 

Some obvious hydrogen-bonding recognition analogs of A-T and G-C mentioned 
above are R-D, R-E, K-D, K-E, N-N, Q-Q, and Q-N with their two hydrogen bonds. 
Protein database (PDB) search indeed reveals quite frequent intra- and interpeptide 
contacts with shorter than Van-der-Waals radii distances for these pairs indicating 
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Fig. 3 Structures of 20 essential amino acids (left) and an example structure of New Polymer 2 
duplex (right) made with a self-complementary strand. Red Rs can potentially be various side chains 
from the amino acids on the left 


bonding. So, Nature does use them, but it does so very rarely compared to NH- 
CO hydrogen bonds and hydrophobic residue interactions that dominate protein 
folding landscapes. Learning from the wealth of protein structures in PDB, protein 
designers have made huge progress in predicting unknown structures (e.g., with 
Rosetta [39] and AlphaFold [40]) and designing new ones [41]. However, even the 
simplest recognition codes elucidated so far are still not as compact and modular as 
A-T and G-C of DNA, limiting their practical use [42]. 

The great wealth of natural PDB structures combined with machine learning 
algorithms is very powerful, but unlikely to uncover a conceptually new code. The 
current protein world is biased to represent structures achievable by evolution due 
to its evolutionary pathway to complexity. To overcome this bias and build a new, 
potentially better protein world we need to develop design principles that are not 
based on reinforcing the current set of “rules.” If we are successful in elucidating 
these new more rational design principles, we will potentially be able to build a new 
protein world and even new forms of life that can live longer and healthier than us. 
After all, we humans possess the power of systematic rational design as well as tools 
of directed and accelerated evolution [43-46]. But let us start by designing a new 
digital polymer. 

Potential pitfalls and ways to overcome them. Even though the recognition 
elements in NP2 are placed with regular spacing (every 6 sigma bonds) on the peptide 
scaffold, the resulting minimum energy conformation of a double-stranded complex 
may not be as periodic as DNA double helix where a regular 3.4 A spacing is 
ensured by m—m stacking. This potential lack of uniform periodicity, also known 
as Schrodinger’s aperiodic crystal requirement for information-bearing polymers 
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(Schrodinger postulated that this requirement is needed for recognition by enzymes), 
is not necessarily a problem for many applications such as strand displacement. 
Strand displacement has already been demonstrated with a-helix based peptides and 
would benefit significantly from the acceleration that is expected for a more flexible 
single peptide chain compared to a rigid a-helix structure [47]. 

To address the recurrent concern about a-helix or B-sheet formation, I expect 
the hydrogen bonds of the code (R—D and G-G) to be more favorable than NH- 
OC hydrogen bonds as in NP1. As with NP1, this is expected because more compact 
interactions are entropically more favorable than the spread-out NH—OC interactions. 

When planning synthesis, one should keep in mind that even at high coupling 
efficiency, solid-state reaction yields drop off rapidly, limiting the length of peptide 
or peptoid strands at sub-50-mer lengths (99% efficiency per-step gives less than 
80% yield after 25 steps [48]). The high coupling efficiencies may be lower due to 
different monomers. However, the hybridization energy per base will likely be higher 
compared to DNA due to the absence of negative charges on single strands. This may 
enable shorter strands of NP2 be sufficient for building nanostructures and behaviors 
analogous to ones in structural and dynamic DNA nanotechnology. 

Future outlook. A digital polymer made of only natural amino acids could be 
synthesized in vivo, enabling a multitude of biomedical applications for molecular 
programming—imagine being able to shape proteins into structures similar to DNA 
or RNA origami via co-translational folding, or being able to attach DNA-like strands 
in precise locations on protein surfaces to enable complex protein-based chem- 
ical reaction networks (CRNs) in vivo. Furthermore, biotechnological production 
of proteins is generally more scalable than that of nucleic acids. Perhaps one day we 
will grow designer nanoscale machines built with principles of DNA programming 
with the ease and cost of a cheese factory. 


3 New Polymer 3 (NP3) 


Design. DNA, NP1, and NP2 use hydrogen bonds in recognition elements. This puts 
upper bounds on the thermal stability of architectures designed with these polymers. 
NP3 aims to make a covalent recognition code for applications requiring very stable 
architectures. Is it possible to use stronger covalent bonds instead of weaker non- 
covalent ones to encode recognition between two strands of a digital polymer? Nature 
evolved to rely on weaker non-covalent interactions in many biological digital poly- 
mers, presumably to enable many dynamic behaviors at physiological temperatures. 
For example, the base pairing energy in DNA (hydrogen bonding + m-—m stacking) 
is typically in 0.4—20.0 kJ/mol range; and more than 99.999% of proteins have free 
energy of folding below 33.5 kJ/mol [41]. Covalent bonds are an order of magnitude 
stronger, for example: N—H (386 kJ/mol), C-N (305 kJ/mol), C = C (602 kJ/mol), 
and C = N (615 kJ/mol). The main concern one may imagine with using covalent 
bonds for constructing a recognition code is that it would not allow error correction, 
i.e., a single incorrect bond will be so strong that it would prevent the system from 
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reaching the desired global thermodynamic minimum. The strongest non-covalent 
DNA bonds are labile above 90 °C allowing for the global minima to be reached by 
slow annealing. Thermal annealing of covalently bound structures will likely destroy 
the constituent molecules. However, some covalent bonds can be made labile under 
mild conditions at room temperature, for example, in the presence of catalysts that 
lower their transformation activation energy. These bonds are known as “dynamic 
covalent.” I propose to use dynamic covalent bonds to construct NP3 (Fig. 4). 

Like NP2, NP3 uses only natural amino acids and relies on a 3-letter code 
constructed from heterophilic (S-D) and homophilic (C-C) interactions. Coinci- 
dentally, both ester and disulfide bonds can be rendered labile under the same 
conditions—low pH (ester) and reducing environment (S—S bond)—which is not the 
case for many other pairs of dynamic covalent bonds that may require incompatible 
conditions to become labile simultaneously [49]. 

Existing approaches to design a simple recognition code with dynamic cova- 
lent bonds. Dynamic covalent bonds have been explored extensively for the purposes 
of drug discovery, material design, and other goals [50]. Substantial efforts have been 
directed toward creating two-letter [51, 52] and even four-letter [53] heterophilic 
codes. The main challenge has been overcoming kinetically-trapped species with 
some recent successes for two-letter systems [54]. 

Potential pitfalls and ways to overcome them. In case the example shown in 
Fig. 4 does not form, there are many degrees of design freedom to explore. For 
example, artificial amino acids such as catechol and boronic acid can be incorporated 
[52]. Artificial amino acids would also allow positioning groups that form dynamic 
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Fig. 4 Examples of dynamic covalent bonds (left and middle) and an example structure of New 
Polymer 3 duplex (right) made with ester and disulfide bonds from a self-complementary strand. 
Red Rs can potentially be any side chain 
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covalent bonds on aromatic residues such as benzene, which can enable stacking 
bonds for a more regular periodicity of duplex via m-1 stacking like in DNA. When 
planning synthesis, consideration should be made for how non-canonical amino acids 
for the recognition elements of NP2 might be efficiently protected and deprotected 
in conventional solid-state peptide and peptoid syntheses. If protecting these groups 
is restrictive, post-polymerization reactions (i.e., thiolene click chemistry) to append 
desired recognition moieties can be explored. If one plans to use NP3 for in vivo 
applications, they should carefully plan the fate of the assemblies given potential 
swings in pH (e.g., acidic lysosomes), redox potential, etc. 


4 Example Applications 


The new digital polymers described above will enable many new capabilities 
unattainable with DNA. Some of the capabilities were summarized in Table 1. Below 
are a few more examples. 

Activatable strand displacement. Overcoming leak reactions in strand displace- 
ment circuits is a significant challenge for dynamic DNA nanotechnology. One 
way to reduce or eliminate leak is to introduce a single covalent base pair such 
as cysteine-cysteine (C-C) from NP3 to a branch migration domain of a double- 
stranded complex based on non-covalent bonds (NPI or NP2). The displacement 
will proceed only if the one special base is activated, e.g., with a reducing agent. The 
activation can be designed to be controlled by light [55] and other stimuli, expanding 
the complexity and versatility of dynamic behaviors that can be implemented with 
molecular programming. 

Bio-orthogonality for in vivo molecular programming. As mentioned above, 
most nucleic acid analogs such as PNA were designed to bind nucleic acids. This 
limits their applications in vivo due to their interference with endogenous chemistry. 
NP2 and NP3 can be used to build a system for molecular programming in vivo that 
is orthogonal to existing nucleic acids. 

Enzymeless translation and self-replication. The covalent code of NP3 makes 
binding by a single monomer stable. Simply by mixing a single template strand with 
free monomers, anew complementary strand can be templated. A ligation mechanism 
can be introduced (e.g., via terminal alkene methasesis) that would zip the new 
templated strand. After this, the complex can be “melted” via labilization of the 
covalent bonds, and the process repeated. Depending on whether the new monomers 
are the same as or different from monomers in the templating strand, this process can 
be conceptually viewed as replication or translation. Neither process would require 
enzymes, unlike their biological counterparts. Enzyme-free information transfer in 
digital polymers has been an focus of experimental efforts for many [56]. Enzyme- 
free protein replication (protein PCR) has been but the wild dream of few. 

Molecular breadboard with 100 x pattern density of DNA origami. Using 
a version of NP1, it is also possible to design a fully addressable breadboard with 
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average functionalization spacing of 0.6 nm (compare to current ~ 6 nm boards). I 
leave out the details of this design to stimulate the imagination of the reader. 


5 Conclusions 


Digital technologies are becoming essential to our existence. The digital paradigm 
enables construction of extremely complex hardware and software from simple 
building blocks. The original digital code can be found in life. The digital nature of 
naturally-evolved polymers such as DNA and proteins is essential to the existence and 
proliferation of life, by allowing robust information flow through precise encoding of 
genetic/structural information in sequences of base pairs/amino acids. While molec- 
ular programming has so far been dominated by work with DNA, chemists have 
been developing artificial polymers for many decades, though only a few of these 
constitute digital polymers, in which the sequence is precisely controlled down to 
single monomers. The few synthetic digital polymers that do exist have been shown 
to be superior to their less precise non-digital counterparts for applications in mate- 
rials [57], molecular electronics [58], tracking [10], and molecular programming, 
because they allow precise control through their sequence of melting point, bandgap 
energy, compact mass-spec signature, and sequence-dependent recognition, respec- 
tively. Yet, despite this progress, these non-biological digital polymers are still largely 
underutilized. 

DNA will likely remain the molecule of choice for molecular programming for 
the foreseeable future due to its current practical advantages. But I hope that the ideas 
outlined here will encourage exploration of new digital polymers with expanded capa- 
bilities—both experimentally by chemists and theoretically by molecular program- 
mers. Nature initially used RNA and DNA as the substrate for life during the “RNA 
world” before evolving construction based on proteins, a paradigm shift that enabled 
much more diverse forms and functions. Similarly, the approaches outlined here 
could lead to more diverse nanotechnologies built with molecular programming. 


Acknowledgements I am grateful to Prof. Helen Tran, her lab, and other anonymous reviewers 
for their helpful critiques on the manuscript. 
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Abstract The unique specificity of DNA interactions and our ability to synthesize 
artificial functionalized DNA sequences makes it the ideal material for controlling 
self-assembly and chemical reactions of components attached to DNA sequences. 
Inspired by the field of molecular electronics and the lack of methods to assemble 
molecular components, we have explored the organization of conjugated molecular 
components using DNA-based self-assembly. In this chapter, we provide an overview 
of our efforts first to assemble and chemically couple conjugated molecules directed 
by DNA, and more recently to assemble conjugated polymers in DNA nanostructures. 
At the end of the chapter, we provide a short overview of work by other groups in 
the field. 


Molecular electronics is an interdisciplinary subject that aims to assemble elec- 
tronic components in a bottom-up approach using conducting molecules as building 
blocks. It represents an alternative to the lithography-based top-down preparation of 
silicon-based electronic circuits. A large variety of organic molecules with poten- 
tially useful electronic properties are available [1, 2]; however, one of the major 
obstacles for taking advantages of the unique properties of these molecules is the 
challenge associated with connecting the molecular components in a controllable 
manner. 

Conjugated oligomers and polymers have been used extensively in bulk organic 
electronics such as light-emitting diodes, field-effect transistors, and polymeric solar 
cells [3, 4]. The study of the single molecule properties of conjugated polymers is 
more limited; however, studies have been performed using, e.g., scanning tunnel 
microscopy [5] and single molecule spectroscopy [6, 7]. The major problem in 
the production of single molecular-based electronics is the inability to arrange the 
components to form circuits. 

In a seminal theoretical paper from 1987, Robinson and Seeman proposed a design 
for an electronic molecular memory chip formed by DNA guided self-assembly 
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of conjugated polymer as wires and metal complexes as memory components [8]. 
Throughout the past 18 years a major focus of our laboratory has been to implement, 
at least in part, Robinson and Seeman’s vision to use DNA-directed self-assembly 
to generate and/or position molecular conjugated wires [9]. DNA nanotechnology 
provides a useful tool for controlling and connecting individual conjugated oligomer 
and polymer molecules at the nanoscale [10]. In order to exploit DNA nanotech- 
nology for controlling organic molecules, the molecules have to be functionalized 
with single stranded DNA (ssDNA) sequences, which in turn direct the assembly of 
the molecules. In this paper, we will provide an overview of our work in this area, 
and we have summarized the contributions from others to the area at the end of the 


paper. 


1 Modular Self-Assembly of Molecular Components 


Our entry into this area was inspired by the intense interest in molecular electronics 
and the emergence of DNA-templated synthesis in beginning of the new millennium 
[11, 12]. We envisioned that DNA-templated synthesis may serve as a tool to both 
assemble and couple individual components to a fully conjugated oligomer as shown 
in Fig. 1. 

Bunz and coworkers had shown that it was possible to assemble molecular rod 
oligomers using rod monomers displaying DNA at each terminus; however, the 
resulting oligomers were unconjugated as their rods were separated by DNA helices 
[13]. In 2001, Czlapinski and Sheppard reported on the DNA-directed coupling of 
salicylaldehydes into metal-salen complexes [14]. The metal-salen coupling was 
ideal for assembly of conjugated oligomers since the linkages are linear between the 
headgroups to form a conjugated oligomer. Furthermore, the electronic properties of 
the complexes should be interchangeable by changing the metal ion. 

In our first approach to solve this problem, reported in 2004, we described the 
development of a DNA-directed bottom-up method for programmed assembly that 
utilized covalent couplings between multiple organic modules [15, 16]. The basic 


Ro 2 2 oe ee NA 


Programmed re 1) Covalent coupling „ 
self-assembly (R Cleavage of DNA 
AL, 


Fig. 1 Concept of DNA-directed assembly and coupling of organic rod monomers into conjugated 
organic oligomers (linear and tripodal black rods: molecular monomers; colored wavy lines: DNA) 
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building blocks are rigid conjugated modules with an oligo(phenylene ethylene) 
backbone and have salicylaldehyde-derived termini. Additionally, each terminus 
contains a hydroxy] linker that is functionalized with either a 4,4’-dimethoxytrity] 
(DMT) protecting group or a phosphoramidite moiety. This allowed for incorpora- 
tion of the module into the 3’-end of a DNA strand followed by removal of the DMT 
protecting group and synthesis of the second DNA strand. The modules were synthe- 
sized both as a linear oligonucleotide-functionalized module (LOM) and a tripoidal 
oligonucleotide-functionalized module (TOM) (Fig. 2a). By using complementary 
strands on the modules, it was possible to direct the assembly without the need for 
additional DNA templates. The modules were covalently coupled with metal-salen 
formation between the salicylaldehydes at the termini of the modules by reaction 
with ethylenediamine (EDA) and a manganese salt. The salicylaldehyde groups of 
two modules are brought in close proximity when their complementary DNA strands 
are annealed together allowing a pseudo-intramolecular reaction to occur (Fig. 2b). 
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Fig.2 a Chemical structure of LOM and TOM. LOM-1 contains two 15-mer sequences: a and b’. 
TOM-1 contains three 15-mer sequences: one c’ and two b. b schematic illustration of the DNA- 
templated coupling. First step is an annealing of the two complementary strands, bringing the two 
reactive groups into close proximity. The next step is the formation of a covalent Mn-salen link 
between the two terminal salicylaldehydes by reaction with EDA and Mn(OAc)ż2. ¢ Illustration of the 
linear oligomer and gel electrophoresis in 8 M urea shows that Mn-salen products are covalently 
linked. d Analogous illustration of the tripoidal constructs. Adapted with permission from [15]. 
Copyright (2004) American Chemical Society 
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Fig. 3 a Illustration of DNA-directed coupling of LOSM by metal-salen formation, followed by 
cleavage of DNA strands by reaction with TCEP. Adapted from [18] with permission from the Royal 
Society of Chemistry. b Chemical structure of the elongated linear oligonucleotide-functionalized 
module (ELOM) 


The DNA sequences on the modules can be altered, which makes it possible 
to combine the LOMs and TOMs to form a variety of linear and branched prede- 
fined structures. Self-assembly and coupling of up to four differently encoded linear 
modules was successfully obtained (Fig. 2c). More complex structures were built by 
combining LOMs and TOMs (Fig. 2d). Up to three LOMs was successfully coupled 
to a single TOM. This method provides a unique degree of control of the architecture 
of the molecular wire; however, a major challenge of the approach was that there 
seemed to be a limit of assembling four modules. Despite many attempts, we never 
managed to make penta- or higher-order structures in reasonable yields. 

The research in this direction was continued to improve the system. In 2005, 
Nielsen et al. [17] investigated the DNA-directed double reductive amination of 
salicylaldehydes in the presence of EDA to form tetrahydrosalen. This amine-linked 
structure was found to be much more stable toward acid, heat, methylamine, and 
ethylenediamine tetraacetic acid (EDTA). 

Additionally, the selective cleavage of DNA strands from the conjugated backbone 
was enabled by installing disulfide bridges between the DNA sequence and the LOM, 
called LOMS (Fig. 3a). The disulfide modified strands were successfully cleaved 
from the backbone by treatment with tris(2-carboxyethyl)phosphine (TCEP) [17, 
18]. 

The attempts to improve the system did not make it possible to assemble more than 
four modules with a satisfactory yield. We speculated that this may be caused by steric 
and charge repulsion between the large oligonucleotide duplexes, as the diameter of 
a duplex is around 2 nm which is the same as the length of a LOM. Therefore, 
it is expected that the duplexes will induce steric strain as the structures grow. One 
possible solution to this problem is to extend the length of the linear module. In 2006, 
Blakskjer et al. [19] reported the synthesis of an elongated linear oligonucleotide- 
functionalized module (ELOM) (Fig. 3b). The DNA-directed coupling between an 
ELOM and LOM was successful; however, attempts to couple two or three ELOMs 
were unsuccessful. It is proposed that this is due to the amphiphilic nature of the 
ELOMs. 

In 2008, a new strategy for DNA-programmed coupling of molecular rods was 
reported by Andersen et al. [20]. The aim of the new method was to make a simpler 
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setup that only requires one conjugate to form dimers or higher-order structures. 
Two 10-mer oligonucleotide-functionalized rod-type modules were synthesized and 
arranged on a DNA template strand. This design places the rod modules approx- 
imately on the same side of the double helix, one helical turn apart. Inspired by 
the previous studies, the modules were functionalized with salicylaldehydes in each 
terminus. The rod was also functionalized with an activated ester in the middle of the 
structure, in order to couple to an amino-modified oligonucleotide. The templated 
coupling was tested by mixing a 20-mer template strand with two equivalents of reac- 
tive strand. The strands were annealed followed by reaction with ethylene diamine 
and a metal salt to form the pseudo-intramolecular coupling between the two reactive 
strands (Fig. 4a). Denaturation of the double helix allows for purification of the single 
stranded coupled product. It was possible to isolate the desired Mn-salen dimer in 
10% yield, whereas the Ni-salen dimer was isolated in 25% yield. Additionally, it 
was also possible to form heterodimers by using two different DNA sequences. The 
preparation of a trimer was also attempted; however, it was not possible to identify 
the product by mass spectrometry. The starting materials were consumed in all reac- 
tions, which suggests that the low yield could be due to side reactions or loss of 
material due to aggregation. A more hydrophilic molecule could potentially solve 
the aggregation problems and allow for the formation of higher-order structures. An 
advantage of this method compared to the previous published method using LOM 
and TOM modules is that this method yields a coupled module with extending single 
strands. These strands could be hybridized to other DNA nanostructures, allowing a 
precise positioning of the nanowire. While the synthesis was simpler than LOM/TOM 
approach, it was limited by low yields and inability to form structures more complex 
than a dimer. 
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Fig. 4 a DNA-templated dimerization of two molecular rods by metal-salen formation. Two reac- 
tive strands are annealed with a template strand to bring the molecular rods in close proximity. The 
rods are coupled by reaction with EDA and a metal salt. By denaturation, the template strand is 
removed, and the coupled product is purified by RP-HPLC. Adopted from [20] with permission 


from John Wiley and Sons. Copyright 2008 Wiley-VCH Verlag GmbH & Co. KGaA, Weinheim. b 
Illustration of 4-helix bundle with four modules 
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In the continuous search for DNA structures that can serve as templates for DNA- 
directed couplings of conjugated organic molecules, interhelical couplings ina DNA 
4-helix bundle (4-HB) have been investigated [21]. A conjugated linear molecule 
containing terminal salicylaldehyde groups and a central activated ester was synthe- 
sized. The module was coupled to amino-modified DNA strands and incorporated 
into specific locations in a well-defined 4-HB, to allow for selective interhelical 
couplings (Fig. 4b). The coupling of the modules was tested with both a metal-salen 
and dihydrazone formation; the coupled products were then analyzed by denaturing 
PAGE analysis. The couplings were found to be very distance dependent, and no non- 
templated couplings were observed. With the metal-salen coupling, it was possible to 
obtain dimers in moderate yields; however, higher-order structures were not obtained. 
For the dihydrazone couplings, trimer formation was achieved. 

The coupling of salicylaldehydes to form metal-salen linkages has been successful 
in other contexts. In a very recent study, the single molecule electronic properties 
of Mn(II), Co(II), and Fe(II)-salen complexes in a break junction were studied. 
The study showed that the metal-salen bridge is a relatively poor conductor but 
that the conductivity is dependent on the nature of the metal [22]. Unfortunately, 
the metal-salen formation is reversible, and the complex is labile in aqueous media. 
Therefore, alternative coupling strategies have been investigated in order to form irre- 
versible, hydrolysis-resistant, and conjugated linkages between molecular modules. 
A method for DNA-directed formation of 1,3-diyne linkages between conjugated 
molecular building blocks has been reported by Ravnsbek et al. [23]. The 1,3- 
diynes can be obtained by a Glaser-Eglinton reaction between terminal alkynes. An 
oligo(phenylene ethylene) molecular rod containing two terminal acetylene groups 
was synthesized to enable the formation of a conjugated linear oligomer. Addition- 
ally, each monomer was functionalized with DMT and phosphoramidite function- 
alities for incorporation into a DNA strand by automated oligonucleotide synthesis 
(Fig. 5a + b). A series of four 30-mer oligonucleotides were prepared by auto- 
mated oligonucleotide synthesis, during which the phosphoramidite rod monomer 
was incorporated in the middle of the strand. This resulted in four oligonucleotide- 
functionalized diacetylene modules (ODM) consisting of organic module with two 
15-mer sequences in each terminal region (Fig. 5c). The DNA-directed Glaser- 
Eglinton reactions were performed between the different ODM strands. Denaturing 
PAGE analysis of the crude products shows the formation of dimer and trimer in a 
high yield (Fig. 5d, lanes 2—4), while the tetramer shows a lower yield (Fig. 5d, lane 5). 
The lower yield is believed to be caused by the increasing electrostatic repulsion and 
steric hindrance between the increasing number of DNA strands. The electrostatic 
repulsion could be removed by changing the DNA strands to PNA strands; however, 
this would also result in a much less soluble molecule. The formed oligomers have 
a size of around 4—8 nm and could have an application as conducting nanowire. The 
oligomer contains single stranded DNA in each terminus, which could be used as 
handles for a specific positioning of the nanowire on a DNA origami structure. 

Despite the advances in DNA-directed synthesis, it was not possible to obtain 
structures longer than tetramers. Significantly longer sequence specific oligomers 
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Fig. 5 a Chemical structure of the oligo(phenylene ethylene) phosphoramidite monomer. b illus- 
tration of annealing and subsequent Glaser-Eglinton coupling between two modified DNA strands. 
c oligomerization of ODM monomers by multiple Glaser-Eglinton reactions. d denaturing PAGE 
analysis of DNA-directed Glaser-Eglinton oligomerization of ODM sequences. Reproduced from 
[23] with permission from John Wiley and Sons. Copyright 2011 Wiley-VCH Verlag GmbH & Co. 
KGaA, Weinheim 


have been prepared by DNA-directed synthesis by others; however, these oligomers 
were not conjugated, and only one DNA strand is linked to the products [24—26]. 


2 Conjugated Polymers on DNA Origami 


During the cause of our work on the modular DNA-directed assembly, the DNA 
origami method was published by Rothemund in 2006 [27]. This method enabled 
the self-assembly of, from a DNA point of view, very large structures with unique 
addressability, such as the rectangular mono-layer origami with dimensions of 100 x 
70 nm?. Such a structure appeared to be an almost perfect breadboard for assembly 
and coupling of the molecular modules described above. However, upon closer 
consideration, it would require extremely efficient immobilization and chemical 
cross-linking to form a continuous wire of 100 nm across the origami structures. 
The modules are only 2 nm long, and it would require immobilization of approxi- 
mately 50 modules and 49 coupling reactions that should all be successful to create 
a continuous wire. Especially in light of the poor yields obtained when attempting 
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to template the assembly in a 4-helix sheet as described above [23], we chose to 
abandon the modular assembly strategy for creating larger wires on DNA origami. 

Instead, we turned our attention to conjugated polymers that offers a continuous 
single molecular conjugated system that may be more than 100 nm long [28]. The 
drawback is the lack of full control over the length of the polymer; however as long 
as it is possible to remove shorter polymers, this may be acceptable. In order to 
control the assembly of such a polymer on DNA origami, it was a requirement to 
conjugate DNA sequences along the polymer to create a graft-type polymer. Based on 
our previous experience with hydrophobic wires and DNA, we assumed that a high 
density of DNA along the polymer (one DNA strand per monomer) would be required 
to avoid aggregation and precipitation. The DNA strands could be attached to the 
polymer by three different methods: (1) premade DNA strands are coupled to the 
monomers which are then polymerized, (2) premade DNA strands with a chemical 
handle are coupled to a complementary functional group on a premade polymer, 
and (3) the polymer is immobilized on a solid support, and the DNA strands are 
synthesized on the polymer by automated synthesis. We decided to pursue the latter 
approach since we believed this would enable the synthesis of long polymers with a 
high density of DNA. 

As described by Knudsen et al. [29], poly(phenylene vinylene) containing a 
triethylene glycol (TEG) linker appending from each repeat unit was synthesized 
(Fig. 6). The TEG linkers were terminated with a tert-butyldiphenylsilyl (TBDPS) 
protecting group, and after deprotection, the terminal alcohol served as starting point 
for the DNA synthesis. Before DNA synthesis, the polymers were characterized 
by scanning tunneling microscopy (STM) (Fig. 7a) [30]. In addition to providing 
information about the molecular structure and composition of the polymer, this also 
showed that a fraction of the polymers was very long (>200 nm). 

By partially deprotecting the side chains and exposing approximately 20-40% 
of the hydroxyl groups, it was possible to immobilize the polymer onto a solid 
support through phosphoramidite chemistry. This was followed by deprotection of 
the remaining side chains and synthesis of ssDNA directly onto the side chains of the 
polymer through automated solid phase oligonucleotide synthesis. Upon cleavage, 
deprotection, and purification, a water-soluble material was obtained (poly(APPV- 
DNA)). 

By this method, we obtained very long DNA grafted polymers of around 200 nm 
that disperse well on a mica surface as shown in the AFM image in Fig. 7b. It was later 
discovered that the polymers had a tendency to aggregate in the presence of divalent 
cations, which significantly lowered the intensity of the emission [31]. The polymers 
were purified by size exclusion chromatography, and the fractions containing the 
longer polymers were selected for further experiments. 

In order to control the immobilization of the polymer along a specific track on 
DNA origami, we used Rothemund’s rectangular DNA origami structure containing 
a line of single stranded DNA appending for every 5 nm on the origami surface. The 
density of DNA on the polymer is much higher as approximately 2/3 of each repeat 
unit of the polymer has a 9-mer sequence. The thermal stability of the attachment 
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Fig. 6 Synthesis of poly(APPV-DNA). The PPV polymer (4a) is synthesized by a dithiocarba- 
mate route. Partial TBDPS deprotection to give 4b allows immobilization on a phosphoramidite- 
functionalized CPG support. After removal of the remaining protecting groups, ssDNA sequences 
are synthesized on the polymer by automated DNA synthesis. As the last step, poly(APPV-DNA) is 
deprotected and cleaved from the solid support. Reused with permission from MacMillan Publishers 
Ltd: [Nature Nanotechnology] [29] copyright (2015) 


was greater than the 9-mer’s melting temperature because of the polyvalent binding 
of these oligos to the origami. 

As shown in Fig. 8a, the polymer was efficiently immobilized along designed 
linear, bent, and U-shaped tracks on the DNA origami [29]. The polymer was also 
routed on a 3D barrel shaped DNA origami structure designed by Wickham et al. 
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Fig. 7 a left: STM image of PPV-TBDPS (4a) on Au(111), right: schematic map showing the 
different polymer strands and their respective lengths. Polymer strands are only measured as far as 
the edges of the STM image field. Reproduced from [30] with permission from The Royal Society 
of Chemistry. b left: AFM topography image showing poly(APPV-DNA) dispersed onto a mica 
surface, right: height measurements of poly(APPV-DNA) on the mica surface. Scale bar = 100 nm. 
Reused with permission from MacMillan Publishers Ltd: [Nature Nanotechnology] [29] copyright 
(2015) 


[32]. The 3D shape of the polymer was characterized by PAINT super-resolution 
imaging as shown in Fig. 8b. 

In further studies, a method for controlling the dynamics of single polymers 
on DNA origami was developed by Krissanaprasit et al. [33]. It was possible to 
switch the conformation of single polymer molecules by toehold-mediated strand 
displacement reactions. The polymer could be directed to one of two tracks on a 
DNA origami by addition of linker strands. The linker strands could be removed by 
toehold-mediated strand displacement with remover stands allowing switching to the 
other track (Fig. 9). 

Immobilizing conjugated polymers onto a DNA origami platform allows the 
discovery of new properties of these polymers. One of the goals motivating the devel- 
opment of DNA grafted conjugated polymers is to enable characterization of intra- 
and intermolecular energy transfer for single polymer molecules. For this purpose, a 
fluorene-based DNA grafted polymer (poly(F-DNA)) was synthesized using the same 
approach as for the poly(APPV-DNA) [34]. By positioning both the poly(F-DNA) 
and poly(APPV-DNA) on the same origami structure, the energy transfer between 
poly(F-DNA) and poly(APPV-DNA) could be investigated (Fig. 10a). When poly- 
mers were immobilized on opposite sites of the DNA origami, no energy transfer was 
detected. This is believed to be due to lacking physical contact between the polymers. 
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Fig. 8 a Top: Illustrations of the of poly(APPV-DNA) immobilized in designed patterns on flat 
rectangular origa.bottom Bottom: AFM topography images of these structures. b left: 3D DNA 
PAINT of guide staple strands on the DNA structure, right: 3D DNA PAINT of the polymer attached 
to the DNA structure. Scale bar = 50 nm. Reused with permission from MacMillan Publishers Ltd: 
[Nature Nanotechnology] [29] copyright (2015) 


Therefore, the energy transfer was tested in solution by using complementary DNA 
strands on the polymers. Energy transfer from poly(F-DNA) to poly(APPV-DNA) 
was observed with a relative efficiency of 37%. It is assumed that the complementary 
DNA grafted polymers form a multi-polymer particle, which means that the energy 
transfer most likely is not taking place between individual polymers (Fig. 10b). 

In order to investigate the intermolecular energy transfer between individual poly- 
mers, immobilization onto the same side of an origami structure is necessary to both 
obtain physical contact between the polymers and control over the stoichiometry. 
Ideally, this setup can be improved in future to arrange multiple different conjugated 
polymers on one origami platform and transfer energy to a final acceptor. This will 
allow harvesting light energy at a broad range of wavelength to exploit the white 
light of the sun. 

The nanoscale transport of light in photosynthesis is a fundamental process where 
light energy is converted to chemical energy and is a prototypical “green” source 
of energy. Thus, the ability to harvest light at the nanoscale has been investigated 
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Fig. 9 Top Illustration of the conformational switching of single polymer molecules on DNA 
origami by toehold-mediated strand displacement reactions. Bottom FRET measurements of the 
switching showing six successful events. Reprinted with permission from [33]. Copyright (2016) 
American Chemical Society 


intensely [35, 36]; however, most research has focused on energy transfer cascades 
using small molecule dyes. In a recent report by Madsen et al. [37], a single molecule 
polymer poly(APPV-DNA) was investigated for its properties as a photonic wire. The 
DNA grafted polymer was immobilized onto a single DNA origami by hybridization 
to a track of single stranded staple strands extending from the origami structure. On 
the same, origami structure donor and acceptor fluorophores were placed at specific 
positions along the polymer, allowing for energy transfer from the donor fluorophores 
to the polymer, through the polymer, and from the polymer to an acceptor fluo- 
rophore (Fig. | 1a). The energy transfer was studied by both ensemble fluorescence 
spectroscopy and single molecule spectroscopy. The distance dependence of energy 
transfer through poly(APPV-DNA) was tested by investigating six different donor— 
acceptor distances on a DNA origami platform. Distances ranging from 10.5 nm 
to 24 nm were investigated, and it was found that the energy transfer efficiency 
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Fig. 10 a Illustration and AFM topography image of poly(F-DNA) and poly(APPV-DNA) immobi- 
lized in an orthogonal alignment on opposite surfaces of the same flat rectangular origami structure. 
Scale bar = 300 nm. b Energy transfer between poly(F-DNA) and poly(APPV-DNA). Left: Illustra- 
tion showing that sequence complementarity results in multiple-strand complexes between the DNA 
grafted polymers, whereas non-complementary strands result in separate DNA grafted polymers. 
Right: Quantification of energy transfer from poly(F-DNA) to poly(APPV-DNA). Reproduced from 
[34] with permission from John Wiley and Sons. Copyright 2017 Wiley-VCH Verlag GmbH & Co. 
KGaA, Weinheim 


decreased with increased distance (Fig. 11b). Interestingly, energy transfer was still 
observed at 24 nm distance between donor and acceptor. As the energy transfer effi- 
ciency did not approach zero at the longest distance, it is expected that energy transfer 
at longer distances should be feasible. This efficient intramolecular energy transfer 
makes poly(APPV-DNA) an efficient antenna molecule for use in light harvesting 
nanodevices. 
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Fig. 11 a Schematic illustration of poly(APPV-DNA) hybridized to a DNA origami platform 
containing donor and acceptor fluorophores. Upon excitation of donor fluorophores energy is trans- 
ferred to the polymer and finally to acceptor fluorophore in the middle of the origami. b Data showing 
the distance dependence energy transfer through poly(APPV-DNA). Reprinted with permission from 
[37]. Copyright (2021) Americal Chemical Society 
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3 Work from Other Groups 


Other groups have also contributed to the work within this field. Sletman’s group 
has shown that nucleobase templated polymerization can be used to control the 
length and polydispersity of a conjugated polymer [38]. A templated polymer with 
controlled molecular weight and narrow polydisperisity was synthesized by a living 
polymerization method. A thymine displaying template polymer was synthesized by 
ring-opening metathesis polymerization. Adenine-modified conjugated monomers 
were aligned on the template polymer by hydrogen-bonding interactions. Subsequent 
Sonogashira polymerization leads to the synthesis of a conjugated polymer (Fig. 12). 
The daughter polymer was found to have a narrow molecular weight distribution and 
a chain length of around 25 monomeric units, close to the length of the template 
polymer. This is in contrast to non-templated polymerization, or polymerization with 
an incorrect template, which gave rise to short polymers with high polydispersities. 
This method is a very useful tool in the synthesis of conjugated polymers of a defined 
length, and it could potentially enable the synthesis of sequence specific polymers by 
exploiting all four nucleobases. However, the method will be limited by the number 
of different monomers that can be incorporated. Additionally, this method does not 
allow for incorporation of a conjugated polymer into other DNA nanostructures due 
to the lack of ssDNA sequences on the polymer. 

More recently, other groups have studied conjugated polymers and oligomers 
using DNA origami platforms. Mertig’s group has synthesized end-functionalized 
polythiophenes [39]. A single DNA strand was attached to the end of each individual 
polymer, and these end-functionalized polymers were then immobilized onto a DNA 
origami in different patterns. They were able to demonstrate that the optical properties 
of densely immobilized conjugated polymers can be fine-tuned by controlling the 
x-x stacking interactions between the polymers. Addition of surfactant molecules 
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Fig. 12 Templated Sonogashira polymerization of adenine-containing monomers by hydrogen 
bonding to a thymine containing template polymer. Adapted with permission from [38]. Copyright 
(2009) Americal Chemical Society 
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Fig. 13 a Scheme of surfactant-induced breakup of stacked polythiophene backbones on a DNA 
origami platform. DDAO = N,N-dimethyldodecylamine N-oxide. b fluorescence emission spectra 
of polythiophene at different DDAO concentrations (0.0, 0.003, 0.03, 0.3 wt%). Fluorescence 
increases with increasing DDAO concentration. Reprinted with permission from [39]. Copyright 
(2017) Americal Chemical Society 


was able to break up the stacked polythiophene backbones and thus enhanced the 
fluorescent emission of the polymers (Fig. 13a, b). 

The Seeman and Canary groups have presented the synthesis of aniline octamers, 
which were functionalized with ssDNA sequences in each end [40]. The DNA 
oligoaniline conjugates were successfully incorporated into 3D DNA crystals. It was 
possible to switch between different oxidation states of the oligoaniline by chemical 
treatment, and the oxidation state could be determined by the visual appearance of the 
DNA crystal (Fig. 14a). This reversible switching opens up the opportunity of control- 
ling the conductivity of DNA-based systems. However, the electronic properties of 
the DNA structure were not been tested experimentally. 

More recently, the same groups reported the preparation of a DNA origami- 
based molecular electro-optical modulator [41]. Two different types of conjugated 
oligomers were synthesized and functionalized with DNA sequences at each end. The 
DNA functionalized oligomers were incorporated into a flat DNA origami structure 
with a central cavity and formed an “X”-shape. They showed that it was possible to 
reversibly alter the fluorescence signal output by redox reactions (Fig. 14b). 


4 Conclusion 


The combined work in this field has developed unique tools to handle conju- 
gated oligomers and polymers at the single molecule level within the field of DNA 
nanotechnology. The original vision was to develop methods to realize the assembly 
and coupling of wires and components for molecular electronics, but we are still far 
away from this, since we have not yet been able to characterize the conductivity of 
the oligomers or polymers. First of all, it is extremely difficult to make two contacts 
to a single molecule organic oligomer/polymer to measure conductivity. This kind 
of measurement has been realized by STM in ultra-high vacuum [6], and it has 
also been shown for carbon nanotubes on DNA origami [42]. However, in spite of 
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Fig. 14 a Image of DNA crystals with incorporated oligoaniline. Changing the charge and oxidation 
state of oligoaniline leads to color changes directly correlated to the oligoaniline form present. 
Reproduced from [40] with permission from John Wiley and Sons. Copyright 2017 Wiley-VCH 
Verlag GmbH & Co. KGaA, Weinheim. b Illustration of DNA origami-based molecular electro- 
optical modulator. The DNA grafted oligomers hybridize to extended stable strands and form an 
“X” shape. Redox reactions makes it possible to tune the fluorescence intensity of the modulator. 
Reprinted with permission from [41]. Copyright (2018) American Chemical Society 


several attempts, we have not been able to reliably measure the conductivity through 
any of the oligomers/polymers described above. Secondly, the polymers are semi- 
conductors, and therefore, it is doubtful that the single molecule polymers would 
show efficient conductivity in the absence of dopants. On the other hand, we believe 
that the structural control of the conjugated polymers has great potential for making 
single molecule optical circuits. As we have recently shown, it is possible to transfer 
excitation from one dye to another through the polymers [37]. For future studies, 
we believe that the key utility will be to build systems for light harvesting. With the 
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spatial and chemical control that this approach offers, it may become possible to 
build systems that mimic photosynthesis. 
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Abstract The seminal recognition by Ned Seeman that DNA could be programmed 
via base-pairing to form higher order structures is well known. What may have been 
partially forgotten is one of Dr. Seeman’s strong motivations for forming precise 
and programmable nanostructures was to create nanoelectronic devices. This moti- 
vation is particularly apt given that modern electronic devices require precision posi- 
tioning of conductive elements to modulate and control electronic properties, and 
that such positioning is inherently limited by the scaling of photoresist technolo- 
gies: DNA may literally be one of the few ways to make devices smaller (Liddle 
and Gallatin in Nanoscale 3:2679-2688 [1]). As with many other insights regarding 
DNA at the nanoscale, Ned Seeman recognized the possibilities of DNA-templated 
electronic devices as early as 1987 (Robinson and Seeman in Protein Eng. 1:295- 
300 [2]). As of 2002, Braun’s group attempted to develop methods for lithography 
that involved metalating DNA (Keren et al. in Science 297:72-75 [3]). However, 
this instance involved linear, double-stranded DNA, in which portions were sepa- 
rated using RecA, and thus, the overall complexity of the lithography was limited. 
Since then, the extraordinary control afforded by DNA nanotechnology has provided 
equally interesting opportunities for creating complex electronic circuitry, either via 
turning DNA into an electronic device itself (Gates et al. in Crit. Rev. Anal. Chem. 
44:354-370 [4]), or by having DNA organize other materials (Hu and Niemeyer 
in Adv. Mat. 31(26), [5]) that can be electronic devices (Dai et al. in Nano Lett. 
20:5604—5615 [6]). 


1 Origami’s Rise 


While Ned Seeman can be almost uniquely attributed with the idea that DNA could be 
designed to assemble into higher order supramolecular structures [7], the means by 
which these structures would be created has varied over the years [8, 9]. Early efforts 
to generate non-extensible structures (such as the Olympian Borromean rings [10]) 
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were followed by the creation of rigid, modular DNA elements based on more rigid 
crossover structures resembling Holliday junctions (the so-called double crossover 
or DX tile; Fig. la [11]) that could be assembled as building blocks into larger archi- 
tectures [12] (Fig. 1b). Attempts to precisely control architecture from the atomic 
level upwards were further encouraged by Rothemund’s discovery that short oligonu- 
cleotide “staples” could be used to fold longer DNAs into defined “origami” shapes 
[13] (Fig. 1c). This realization has since exploded into a huge range of architec- 
tures, both two- and three-dimensional [14], marvelous artistic endeavors [15], and 
a range of potential application areas for origami [16]. Further extending the depar- 
ture from atomic level control, there is even tantalizing evidence that extensible 
three-dimensional crystal structures can arise, albeit somewhat fortuitously, allowing 
enhanced engineering of DNA lattice dimensions [17]. These progressive advances 
in atomic scale control over complex structure now provide intriguing opportunities 
for configuring electronics. 


Fig. 1 Progression of DNA origami shapes through technology. a Seeman’s double-crossover 
[18] junction, one of five original designs. Self-assembly occurs via a mechanism similar to Holl- 
iday junction formation, but these synthetic molecules constitute a new class of DNA structures. 
Adapted with permission from [11]. Copyright [11] American Chemical Society. b Seeman’s group 
self-assembling DNA origami composed of repeating two DNA units, A and B, which form into 
origami strips with a 33 nm periodicity, as measured by AFM. Scale bar is 300 nm. Adapted by 
permission [12], Copyright 1998, Springer c since then, DNA origami has become more complex 
and controlled. Rothemund’s utilized computer-aided design and DNA staples to fabricate advanced 
shapes, including (from left to right); squares, rectangles, stars, smiley faces, triangles with rectan- 
gular domains, and trapezoidal domains with bridges between them. Color indicates the base-pair 
index along the folding path; red is the 1st base, purple the 7,000th. Top row of the AFM images is 
165 nm x 165 nm, and lower row scale bars are 100 nm, with the exception of the rectangle, which 
is 1 wm. Adapted by permission [13], Copyright 2006, Springer 
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2 Making DNA Nanostructures Conductive Through 
Metallization 


In order to make DNA nanostructures into electronic devices, one approach is to 
determine how to make DNA conductive. While there is some evidence that DNA 
itself can be a conductive material [19, 20], its conductivity is relatively low [21]. 
To solve this problem, nucleic acids can interact with metals (or be modified to 
interact with metals), and if completely metalated can readily pass current. To this 
end, Yan et al. constructed nanogrid and nanoribbon lattices from four arm junctions 
and used a two-step procedure involving aldehyde derivatization of the DNA, silver 
seeding, and then complete silver metalation to create conductive nanowires [22] 
(Fig. 2a). The metallized nanowires showed a linear-ohmic current-voltage (I-V) 
profile using a two-terminal setup. The bulk resistivities of the nanoribbon structures 
were 2.4 x 10% Q-m, roughly 100-fold more than polycrystalline silver (1.6 x 10° 
Q-m). The higher resistance could be due to the difficulty in calculating the true unit 
cross-section of the silver, granularity in the silver structure, or low densities of the 
metals following deposition. Nonetheless, this was one of the first demonstrations 
that organized DNA assemblies might eventually prove useful for the creation of 
programmable electronic devices. Following up this feat, the LaBean lab then created 
and metalated nanotubes (Fig. 2b) [23] based on triple-crossover tiles (TX, previously 
developed in collaboration with the Seeman lab [24]). These structures were found 
to have around a tenfold higher resistance compared to the nanoribbon strategy, 
however, still showed the ohmic behavior of silver expected of a conductive metal 
structure. 

In an exciting recent result, Shani et al. became the first to report supercon- 
ducting nanowires based on metalated DNA [25]. Niobium nitride, a material known 
to become a superconductor below 16 K, was deposited onto the surface of DNA 
using magnetron sputtering (Fig. 2c). The coated DNA exhibited a superconducting 
transition at approximately 5 K, ascribed to thermally activated phase slips, with 
further improvements in conductivity with the onset of quantum phase slips at 3.7 K. 
The DNA-templated superconductor exhibited maximal conductivity at the lowest 
measured temperature of 2.2 K and also showed a large negative magnetoresistance, 
further indicative of superconducting properties. 

Conductivity alone is only one aspect of an electronic device, and thus, there is 
a continuing need to direct particular electron flows. The irregular nature of DNA 
origami provides opportunities for creating virtually any pattern, and the Harb and 
Wooley labs in turn metalated [26] and demonstrated the conductivity [27] of origami, 
including plating branched structures (Fig. 2d). Of course, mere patterning is not 
necessarily sufficient to create functional electronic junctions. Strategies for diversi- 
fying electronic interfaces include hybridization of DNA-conjugated gold nanoparti- 
cles to allow site-specific seeding [28] and organic masking to deposit different metals 
[29]. Another approach has been to sacrifice continuity of charge transport in favor 
of creating specific metalated structures side-by-side on “nanoscale printed circuit 
boards” [30]. The characterization of conductance on C-shaped origami nanowires 
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Fig.2 Making DNA electrically conductive via metallization. a the first metallized DNA origami 
structure. LaBean’s group was the first to achieve this milestone. DNA ribbons composed by 4 x 
4 DNA tiles were silver seeded, and the conductivity was measured using current-voltage spec- 
troscopy (a inset) scale bar 2 um. Reprinted/adapted by permission from [22]. Reprinted with 
permission from AAAS. b this was followed up by the same group using silver metalated DNA 
nanotubes instead of their previous DNA ribbons. The conductivity (b inset) was calculated to be 
tenfold lower than the ribbons. Adapted from [23]. Copyright 2004, National Academy of Sciences. 
c the first superconducting metalated DNA nanowire was published by Shani et al. The left image 
shows the HR-SEM of the niobium nitride coated DNA nanowire suspended on a black channel. 
On the right, it is the resistance measurements as a function of the temperature, with special refer- 
ence to the thermally activated phase slips (TAPS) and quantum phase slips (QPS), associated with 
superconductivity. Adapted from [25] with permission, CC BY 4.0. d more complex DNA origami 
shapes can also be metalated and still show electrical conductivity. The left image shows the AFM 
images of the DNA origami CC structures with the corresponding histogram of gold-metalated 
resistance values for different devices. The number of metalated CC connections between each 
electrode was counted (red) and the total resistance calculated (blue) [32]. Adapted with permission 
from [27]. Copyright 2013, American Chemical Society 
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has indicated a variety of modes of charge transport, including hopping, thermionic, 
and tunneling mechanisms [31]. 

Despite the wealth of shapes and patterns afforded by DNA, a variety of 
methods for casting and plating, numerous possible charge transport mechanisms, 
and nanoscale electronic devices have for the most part not yet emerged from attempts 
to generating conductive DNA. This is in part because of the difficulties in actually 
forming the equivalent of atomic scale bandgap p-n junctions that are key to all elec- 
tronics (and hauntingly were also key to Seeman’s original dreams for self-assembled 
memories devices [2]). 

While these difficulties may yet be overcome via even more refined and specific 
methods for metal placement and coating on DNA, an alternative approach has arisen 
in which the key advantage of nanoscale placement, rather than nanoscale patterning, 
is emphasized. 


3 Decorating Origami 


Moore’s law is of course an observation, rather than a requirement, and suggests 
that the number of transistors within an integrated circuit will double approximately 
every two years. As we reach the limitations of lithography methods to achieve 
ever-increasing circuit densities for advanced computation, alternative fabrication 
techniques are being explored. In particular, DNA nanotechnology has the theoretical 
ability to pattern semiconducting elements with sub-nm resolution (since the width 
of ssDNA is ca. 0.9 nm) and could potentially be used in moderate throughput for 
the bottom-up, self-assembled fabrication of semiconducting chips. 


3.1 DNA Scaffolding for Conductive Metals 


While it has proven possible to make origami itself conductive, origami can also be 
used as an armature or scaffold for identifying, sorting, and positioning conductive 
elements. This has been achieved with a variety of DNA-material hybrids, including 
gold nanoparticles [33, 34], nanorods [35, 36], and quantum dots [37]. These hybrids 
can be further organized into higher order conductive structures. For example, Chad 
Mirkin’s group created spherical nucleic acids (SNA) [33], and three-dimensional 
nucleic acid nanostructures composed of a nanoparticle core (commonly gold) 
densely functionalized with DNA. By functionalizing SNAs with DNA sequences 
that could recognize planar nanoparticle clusters, Oleg Gangs’ group created elec- 
tronically conductive stacked supramolecular assemblies [38] (Fig. 3a). These assem- 
blies were capable of forming filament-like, pillar structures, comprised of alternating 
SNAs and planar nanoparticles. By controlling the number, position, size, and compo- 
sition of the nanoparticles, different three-dimensional pillar architectures could be 
produced with a variety of physical and electrical transport properties. 
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Fig. 3 Using DNA origami to position different materials into high-ordered architectures. a Tian 
et al. used SNA’s to control the assembly of planar nanoparticle clusters into electrically conduc- 
tive pillars. The left image shows the SEM of these structures composed of two different clusters. 
Cluster A has 3 SNAs, and cluster B has 4 SNAs. The right image shows the assemble scheme for 
the two alternating clusters (A&B). Adapted with permission from [38]. Copyright 2017, American 
Chemical Society. b the first publication, by Aryal et al., showing electrically connected metal— 
semiconductor junctions. The left SEM image shows the smaller gold nanorods seeded onto the 
DNA origami deposited onto silicon oxide wafers. After which, the thinner CTAB-coated tellurium 
nanorods were deposited within the gaps. The nanorods were electroless plated with gold to elec- 
trically connect the structures. Current—voltage spectroscopy (right) exhibited a diode response 
consistent with a Schottky junction. Adapted by permission [39], Copyright 2020, Springer. c. This 
is the first publication showing the placement of polymers into two- and three-dimensional architec- 
tures with DNA origami (Knudsen et al. [40]). The top image shows the APPV polymer conjugated 
to the ssDNA staples (extending from the phenylene groups). Using DNA origami placed on a 
substrate, the DNA staples route the polymer into specific designed architectures. The six smaller 
illustrations show the origami design and the polymer structures obtained. The original paper also 
contains AFM topography data showing correct routing. Adapted by permission [40], Copyright 
2015, Springer 
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DNA origami has also been used as a scaffold to assemble single metal wires 
composed of contiguous electrically connected metal-semiconductor junctions. In 
2020, Aryal et al. [39] deposited DNA origami onto a silicon wafer then posi- 
tioned three DNA-coated gold nanorods (100 nm x 20 nm) along the DNA with 
increasing gaps between the nanorods (Fig. 3b). Next, tellurium nanorods were posi- 
tioned into the gaps between the gold nanorods via electrostatic interactions. Finally, 
the gaps between the nanorods were filled via electroless gold plating, creating a 
nanoscale wire composed of alternating Au-Te-Au junctions. When subjected to 
current-voltage (I-V) electrical characterization, the response showed a non-linear- 
ohmic nature consistent with Schottky junction (aka Schottky barrier diode) proper- 
ties. This was the first demonstration of how DNA origami can be used to assemble 
nanoscale metal—semiconductor junctions. 


3.2 DNA Scaffolds for Conductive Polymers 


An alternative to introducing metals into DNA origami materials is the incorporation 
of conductive polymers into the origami. This was originally accomplished by the 
Gothelf and Dong labs; a conjugated paraphenylene vinylene (APPV) brush polymer 
that contained a nine nucleotide single-stranded DNA staple was synthesized [40]. 
To program the placement of the APPV onto a silicon oxide substrate and resultant 
architectures, DNA origami that could hybridize to the oligonucleotide attached to 
the polymer was constructed. Using this method, the authors were able to direct 
the placement and orientation of the APPV-DNA hybrid to create linear structures, 
90° curves, U-shape, staircases, and circular designs (Fig. 3c). Ultimately, three- 
dimensional cylindrical DNA origami structures were constructed, and APPV was 
routed around. While in this implementation, no electrical conductivity measure- 
ments were performed, but the surface potential of the APPV-DNA polymer chain 
was measured to be -130 mV, implying a charge transfer higher than the underlying 
silicon oxide. These results provide an excellent proof-of-principle for extension to 
more conductive polymers and demonstrate the ability to create very compact three- 
dimensional structures capable of organizing conductive polymers into molecular- 
scale electronics, with the polymer providing a link to softer, flexible, and perhaps 
more biocompatible materials for biomedical engineering applications. 


3.3 DNA Scaffolds for Carbon Nanotubes 


The DNA-based placement of conductive and now semi-conductive metals is 
powerful, but electronic applications remain elusive because of a need for scaling, an 
application that may yet arise via the placement of carbon nanotubes (CNTs); though 
in particular, controlling the placement and positioning of CNTs in three-dimensions 
is incredibly complicated. Structural precision can potentially be achieved without 
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DNA using thin-film approaches, but these are prone to assembly defects such as 
crossing, bundling, and non-conforming pitch [41], parameters essential for creating 
a precisely ordered three-dimensional CNT architecture for applications in integrated 
circuits. Utilization of DNA nanostructures as templates will allow greater precision 
in the nanofabrication process. If architectural problems are overcome, CNT-based 
computers can potentially outperform the best silicon devices [42], while being at 
least an order of magnitude more efficient [43]. This efficiency is incredibly impor- 
tant in regard to Dennard Scaling, which states that as a transistor reduces in size, the 
power density stays constant. This reduction in power usage allows manufacturers 
to increase the operating frequency of microprocessors and boost performance. Over 
the past 5 years, there has not been a significant increase in operating frequency, 
with IC manufacturers resorting to multi-core threading to overcome performance 
restrictions. Using new highly efficient materials such as CNTs would enable manu- 
facturers to go beyond silicon-based ICs and increase operating frequency with less 
impact on power requirements. 

In 2003, Zheng et al. [44] reported that ssDNA can strongly interact with CNTs to 
form a stable complex; this advance most importantly allowed CNTs to be solubilized 
in aqueous solutions. The DNA-CNT hybrid architecture (wrapping) was found to be 
dependent on a short (10-45 nucleotide) GT-rich sequence which could form a two- 
dimensional sheet structure. Furthermore, it has been discovered that the chirality of 
the DNA around the CNT is dependent on the handedness of the helicity of the DNA 
[45]. Remarkably, GT-rich elements can selectively bind all 12 of the major chiral 
semiconducting CNT species [46]. The wrapping of the negatively charged DNA 
around the CNTs was also found to promote the selective separation of chirally pure 
CNTs via a positively charged anion exchange resin [46], potentially solving one of 
the major application challenges for CNTs, since chiral structures cannot be readily 
synthesized but have superior electrical conductivity and semiconducting properties 
[47]. 

These results set the stage for using DNA origami to fabricate highly specific and 
orientated nanostructures with CNTs. An inherent problem with using CNTs is the 
difficulty of organizing them into specific orientations. This complication was nicely 
remedied by Erik Winfrees’ group, who used DNA origami with “hook-binding 
domains” to perpendicularly place two CNTs into a specific two-dimensional orien- 
tation [48] (Fig. 4a). The alignment of the CNTs into cross-junctions positioned them 
with 6 nm resolution and led to a stable field-effect transistor (FET)-like behavior, two 
firsts. This was quickly followed by the Goddard group using small-structured DNA 
linkers to establish highly dense parallel CNT arrays that could be self-assembled 
on the surface of mica (Fig. 4b). By tuning the length of the DNA linker, the CNT 
pitch could be tuned accordingly, with distances ranging from <3 nm to >20 nm [49]. 
Using a different DNA origami architecture that resulted in parallel placements, the 
Norton group utilized larger “blocks” of DNA origami that acted as linkers to orient 
two CNTs onto a one-dimensional origami construct, separating CNTs of ~100 nm 
length by more than 500 nm (Fig. 4c) [50]. Alternative methods of assembling CNTs 
on DNA origami templates using streptavidin—biotin interactions have also proven 
successful in generating defined cross-junctions [51]. 
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Fig. 4 Using DNA origami to precisely position CNTs into two-dimensional structures. a the 
seminal study by the Winfree group showing the first time DNA origami was used to position CNTs. 
By using different DNA linkers built upon a 7000 bp scaffold, the authors were able to position 
two CNTs into a perpendicular cross-junction (left). This architecture was capable of displaying a 
FET behavior (right). Adapted by permission [48], Copyright 2009, Springer. b following up this 
work, Han et al. used DNA linkers to control the pitch between the CNTs by using different sized 
linkers. On the left, it is six CNTs positioned with 20 bp linkers creating an array pitch of 8.5 nm, 
whereas on the right, the linkers are 60 bp and create a pitch of 22 nm. Adapted with permission 
from [49]. Copyright 2012, American Chemical Society. c using a different design architecture, 
Mangalum et al. also achieved parallelly positioned CNTs using individual “block” DNA origami 
structures. These blocks contained ssDNA loops which enabled the blocks to be linked together to 
create a scaffold to position CNTs ~100 nm apart, over a distance of 500 nm, bringing nanometer 
architectures close to micrometer size domains. Adapted with permission from [50]. Copyright 
2013, American Chemical Society. d rather than relying on DNA wrapping, Pei et al. created an 
amide bond between the ssDNA and the ends of the CNTs (left). By using a one-point linkage, they 
can utilize double-stranded DNA hybridization to create origami rafts (right) which can pivot round 
the linkage point to positionally control the CNTs. Adapted with permission from [53]. Copyright 
2019, American Chemical Society 


Even more complex “Y-shape” nanostructures have been created that involve a 
three-way DNA junction with protruding single-stranded sequences that anneal to 
CNTs. These nanostructures orient the three CNTs at an approximate 120° angle to 
each other. Further, higher-ordered networks of each Y-shaped triplet can be assem- 
bled to create a DNA-CNT “mesh” [52]. Recently, Seeman’s group [53] has demon- 
strated that rather than absorbing DNA to the sides of SWCNTs via van der Waals 
forces, DNA can be specifically localized to the ends of SWCNT structures via 
conjugation chemistry between a pendant amino-group on the DNA molecule and 
carboxylates presented on the termini of the SWCNT. By orienting DNA to the termi- 
nals of the nanotubes, the conductive lattice structure on the sides is not impeded, 
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reducing interference with the electrical properties of the SWCNT. The ssDNA conju- 
gated to the reactive end of a SWCNT can thus be used to orient multiple nanotubes 
via hybridization, creating precisely positioned SWCNTs within a two-dimensional 
DNA origami “raft” [53] (Fig. 4d). In another approach to organizing supramolec- 
ular structures, a conductive SNA nanoparticle “linker” was used for the parallel 
positioning of DNA-conjugated carbon nanotubes, resulting in a five-fold binding 
improvement over just relying on the stickiness of the nanotubes themselves [54]. 


3.4 Highly Ordered, Three-Dimensional DNA-CNT Arrays 


Two publications, released on the same day, go a long way toward the implementation 
of direct biotemplated CNT nanofabrication using DNA origami “bricks.” Sun et al., 
[55], developed a highly scalable supramolecular assembly method, termed Spatially 
Hindered Integration of Nanotube Electronics (SHINE). Densely parallel, aligned 
arrays of CNTs were fabricated within an array of channels, achieving precise inter- 
tube spaces of 10.4 nm (Fig. 5a). Using the SHINE assembly method, Zhao et al. [56] 
then created a multichannel p-channel metal-oxide semiconductor field-effect tran- 
sistor (FET) by fixing the DNA-templated CNTs onto a polymer-templated silicon 
wafer. The CNTs were fixed into position with metal bars, and then, electrodes and 
gate dielectrics were deposited onto the device (Fig. 5b). The relatively poor elec- 
tronic performance often observed with biotemplating was remedied by removing 
contaminating DNA and metal ions with ultra-pure water, low-concentration H203, 
and thermal annealing after the CNT fixing stage. This rinsing-after-fixing approach 
improved the key transport performance metrics by a factor of 10 compared to the 
previous biotemplated FETs. This approach highlights the advantage of using DNA 
origami for the organization of truly three-dimensional architectures. 


4 The Future of DNA-Organized Electronics 


The future of DNA-organized electronics will likely resemble its past and present 
but will build on substantive technical advances that fall outside the field. The key 
enabler, the ability to organize and ultimately design structure at the atomic scale, 
will be augmented by new opportunities for using novel mechanisms to mediate 
electron flow. In particular, it should be possible to either make DNA itself more 
conductive by changing its fundamental chemistry, or to better adapt biocompatible 
electronic materials to nanostructured DNA scaffolds. 
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Fig. 5 Creating highly ordered CNTs in three-dimensional architectures. a the revolutionary 
spatially hindered integration of nanotube electronics (SHINE) method developed by Sun et al. 
DNA origami nano-bricks were assembled into three-dimensional side walls and bottom layers 
creating trenches in which DNA handles sit (left). The CNTs wrapped with DNA anti-handle 
linkers when added (right), seat themselves within the confined trenches, creating a highly parallel 
ordered array with a precise pitch of 10.4 nm. This pitch can be controlled by using different DNA 
origami sidewall thicknesses. TEM imaging shows the specific topography of the array, with the 
CNTs positions indicated with the yellow arrows. From [55]. Adapted with permission from AAAS 
b using the SHINE method, Zhao et al. constructed a high-performance FET. In short, the authors 
constructed a 10.4 nm pitch CNT array, added metal bars, source, drain, and gate dielectrics followed 
by rinsing to remove the DNA and any contaminating metal ions before finally adding the gate. The 
current-voltage curves (right) highlight the difference between not removing the DNA and thermal 
annealing (gray line) and after thermal annealing (red line), improving the FET performance by an 
order of magnitude. From [56]. Adapted with permission from AAAS 


4.1 Making DNA More Electronic 


While there are now extremely interesting opportunities for creating Schottky junc- 
tions between DNA-patterned and non-DNA-patterned materials, as described above, 
the opportunities for atomic scale control over the geometry and charge flow, as envi- 
sioned by Seeman [2] remains largely unrealized to-date. The key to further enable- 
ment may come through the control of chemistry that is afforded by DNA nanotech- 
nology. It is relatively simple to append electronic elements (the most obvious being 
ferrocene) to nucleobases, allowing more precise control over where electron transfer 
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occurs. Further, the introduction of unnatural nucleobases offers further opportuni- 
ties for atomic level placement, including non-standard base-pairs that specifically 
chelate metals at the Watson—Crick interface [57]. 


4.2 Scaffolding Biocompatible Electronic Materials 


While the electronic properties of native DNA can be improved, it is still somewhat 
surprising that it is a conductive material at all, given the lack of obvious redox 
moieties available [58]. Recently, there has been a paradigm shift (entirely in the 
Kuhnian sense) in our understanding of how biological materials can potentially 
transfer electrons. The intermediacy of redox active compounds remains a key feature 
of bio-enabled electrochemistry and has been successfully utilized in constructing 
photonic DNA-organized excitonic circuits [59-62]. However, the opportunities for 
long distance electron transfer in the absence of such intermediacy are becoming 
more and more apparent. This has long been the case for tunneling between organic 
cofactors in cytochrome couples (for example) but has only recently been implicated 
for peptides and proteins that lack cofactors. 

A particularly relevant example of this shift in understanding has been garnered 
from the study of a synthetic 29-mer peptide that forms an antiparallel coiled- 
coil hexamer (ACC-Hex) and has been shown to have a remarkably high electrical 
conductivity [63]. The mechanism of electron transport is currently under debate, 
since features of both ohmic electronic transport and metallic-like temperature depen- 
dence are observed, despite the fact that it contains neither extended conjugation, 
m-stacking, nor redox centers. A similar story seems to be unfolding in the study 
of electrically conductive protein nanowires (e-PNs). Geobacter sulfurreducens, a 
common anaerobic species of bacteria found in anoxic subsurface sediments, is 
capable of respiring via the reduction of metals, such as Fe(II). These bacteria are 
able to accomplish this feat by producing long thin filaments that protrude from 
the cell and electronically link the cell to the metal [64]. Similar to the ACC-Hex 
fibers, purified e-PNs display ohmic electronic transport and metallic-like tempera- 
ture dependence. They also contain a high density of aromatic amino acids positioned 
conservatively throughout one of their constituent proteins [65]. Removal of the 
aromatic amino acids drastically reduces the electrical conductivity of the nanowires 
[64]. 

The potentially critical role of aromatic amino acids in electronic conduction 
presents a conundrum about what novel electron transport mechanisms may be 
available to biological polymers. Irrespective of the underlying physics, there has 
been a remarkable plethora of practical devices produced from the e-pili [66—68], 
including generating energy from environmental ambient humidity [69]. Currently, 
these methods involve using thin-film approaches to create devices, and as we have 
seen in other aspects of microelectronics, this can ultimately limit the overall device 
architecture and functionality. As with other electronic components, such as CNTs, 
the use of DNA origami now offers a truly unique opportunity to contort and position 
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Fig. 6 e-PNs engineering to create hybrid protein: DNA electronic devices. One type of e-PNs are 
formed from the monomeric type-IV pilin protein of G. sulfurreducens which can be with modified 
by engineering the carboxy terminal domains exposed to the environment on the surface of the 
nanowire. At this termini, the interactions of fused specialty peptides which can bind a multitude of 
analytes, including metal ions, nanoparticles proteins, or other molecules, can create interactions 
that will modulate electrical properties. Additionally, the inclusion of non-canonical amino acids 
can further increase the range of chemistries available for binding and electron transfer. Bound 
molecules and engineered peptides can further connect with three-dimensional origami scaffolds 
(the Sun et al. SHINE design is shown) to create bio-produced field-effect transistors which can be 
used to sense specific analytes for biosensing applications 


these conductive wires into specific orientations (Fig. 6). However, the potential for 
developing sophisticated devices goes well beyond CNTs, since a variety of prop- 
erties in the pili can be tuned via genetics: They can be functionalized with peptide 
domains to bind specifically to sites on DNA nanostructures, going beyond the 
terminal attachment of carbon nanotubes; the number and type of aromatic residues 
available for conductance can be modulated at will; the length and diameter of the 
pili can be controlled biologically. The use of DNA origami to organize protein- 
based nanowires into electrode arrays would represent an entirely new direction in 
bioelectronics, one which we believe could finally realize the potential of Dennard’s 
and Moore’s laws at the nanometer-scale. 
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DNA Assembly of Dye Aggregates—A N) 
Possible Path to Quantum Computing giec 


Bernard Yurke 


Abstract DNA-based self-assembly enables the programmable arrangement of mat- 
ter on a molecular scale. It holds promise as a means with which to fabricate high tech- 
nology products. DNA-based self-assembly has been used to arrange chromophores 
(dye molecules) covalently linked to DNA to form Förster resonant energy transfer 
and exciton-based devices. Here we explore the possibility of making coherent exci- 
ton information processing devices, including quantum computers. The focus will 
be on describing the chromophore arrangements needed to implement a complete 
set of gates that would enable universal quantum computation. 


1 Introduction 


Already in the earliest days of DNA nanotechnology, Richardson and Seeman imag- 
ined using DNA nanotechnology to fabricate electronic information storage and 
processing devices, such as three-dimensional memories in which the wires and 
switching elements were of molecular scale [1]. My own entry into this field, in the 
late 1990s, at Bell Laboratories, was motivated by this vision. The tools of DNA 
nanotechnology have undergone considerable development since those early days 
[2, 3], but its use in the construction of electronic or photonic information pro- 
cessing systems is still in its infancy. One of the more developed areas is the use 
of DNA nanotechnology to arrange metallic nanoparticles, quantum dots and dyes 
in desired configurations [4—9], (here referred to as aggregates) aimed at making 
photonic devices. A recent development is the use of DNA self-assembly to con- 
struct dye aggregates in which the distance between the dyes is sufficiently close 
that excited-state energy can be transferred between neighboring dyes in a manner 
that maintains quantum coherence [10-17]. The packet of energy that is transferred 
between dyes is referred to as an exciton and that exhibits quantum mechanical 
particle-like properties. The possibility of using such aggregates as quantum gates 
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and quantum computers was raised by Castellanos et al. [18, 19]. They exhibited 
dye aggregate configurations that implemented the function of a set of three types 
of gates that would enable universal quantum computation. As the authors point out, 
however, their embodiment of these gates does not allow one to exploit the full power 
of quantum computation. To to accomplish that, the authors note, use must be made 
of the interaction between two excitons. Here a set of gates is provided for which the 
interaction between two excitons is employed in the implementation of the controlled 
not (CNOT) gate. These gates do form a complete set enabling universal quantum 
computation. The quantum computation scheme employed is that of a many particle 
quantum walk, similar to the scheme proposed by Childs et al. [20]; however, here 
a dual-rail architecture is employed that greatly simplifies gate design. 

Random perturbations of excitons by molecular vibrations tend to wash out the 
delicate quantum interference effects on which quantum computing relies. Other 
would-be quantum computing technologies also must contend with random pro- 
cesses by which quantum interference effects are washed out. This problem, how- 
ever, is sufficiently severe for exciton-based quantum gates that it poses a significant 
challenge to exciton-based quantum computation’s capacity to win the race against 
all the other quantum computation technologies being developed [21-23]. Whether 
dyes can be synthesized for which the interaction between excitons and molecular 
vibrations is sufficiently weak to enable more than the demonstration of rudimentary 
quantum computing circuits remains to be seen. The employment of devices based on 
molecular transitions at the energy scale of visible light photons does open, however, 
the prospect for quantum gate operation at room temperature and with femtosecond 
switching speeds. Quantum computation through the assembly of dye aggregates via 
DNA-based self-assembly thus provides a reach goal to drive the development of 
high technology photonic devices in which the gates are of molecular scale, have 
high switching speeds, and the circuits have high component density. 

Although the scope here is limited to dye aggregate architectures that can function 
as exciton-based quantum gates for universal quantum computation, with the implicit 
understanding that currently DNA self-assembly is the most promising means by 
which to assemble such aggregates, it worth noting that the study excitons in dye 
aggregates and molecular crystals has a long and extensive history [24]. This history 
predates the field of DNA assembly, beginning with theoretical work by Frenkel 
[25] in 1931 and the experimental work by Jelley [26] in 1936 and by Scheibe et al. 
[27-29] in 1937. 

As indicated, here we explore the possibility of making information processing 
devices out of dye aggregates constructed by DNA self-assembly. Dye molecules 
exhibit color due to their ability to absorb light at specific wavelengths. The process 
occurs because the molecule has a transition from the ground state to an excited state 
that can be induced by the absorption of a visible light photon. Such a transition is 
said to be optically allowed in order to distinguish it from transitions that require a 
change in the total electron spin angular momentum, something that the absorption 
of a photon alone cannot do, or transitions that a photon cannot induce for symmetry 
or other reasons. 
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The bundle of energy stored in the molecule upon absorption of a photon can be 
transferred from one dye molecule to a neighboring dye molecule. In this process one 
dye molecule returns to its ground state while simultaneously the other is promoted 
from its ground state to its optically allowed excited state. This bundle of energy that 
once was a photon can thus propagate from dye molecule to dye molecule throughout 
an aggregate of dye molecules. In this manner the bundle of energy behaves like a 
particle and could be used as a carrier of information much like an electron. 

This bundle of energy, which resides on one dye molecule at a time, is referred to 
as an exciton. Technically, this bundle of energy is referred to as a Frenkel exciton to 
distinguish it from a Wannier—Mott exciton which is an electron-hole pair residing 
in a semiconductor material. Either is simply referred to as an exciton, when it is 
clear from context which type of exciton is meant. 

When the spacing between dyes becomes 2nm or less, the hopping occurs in a 
coherent manner, which enables the exciton to behave like a quantum mechanical 
particle exhibiting wave-like behavior. This is where quantum mechanical wave- 
particle duality enters. Even though at any instant of time the exciton resides on only 
one dye, the exciton behaves as if it is a wave spread out over the entire dye aggregate. 

When a pair of neighboring dyes are both in their excited state, the electrostatic 
energy between the dyes will be different from that when only one dye is excited 
due to the change in a dye’s electron density distribution that occurs when the dye 
transitions from the ground state to the excited state. This gives rise to a two-body 
or exciton—exciton interaction in which excitons scatter off of each other. 

These two properties, the ability of an exciton to coherently hop from dye to dye 
and the ability of two excitons to scatter off of each other, in principle, enable suit- 
ably structured dye aggregates to function as quantum gates, information processing 
systems and maybe even as quantum computers. 

In the following sections a set of dye configurations will be described that form 
a complete set of gates for universal quantum computation. Also described is how 
these gates can be assembled into functioning information processing systems and 
quantum computers. Nonidealities that must be overcome to realize such devices are 
also discussed. 


2 The Mathematical Structure of Reality 


Here a brief introduction to quantum mechanics is provided with the aim to give 
some indication of where the power of quantum computing resides. Quantum theory 
has survived 100 years of rigorous testing and, as near as can be discerned, pro- 
vides the foundational description of all physical phenomena. This includes physical 
computation processes. In this sense all computers are quantum computers. But not 
all computers take advantage of the additional computing resources that quantum 
mechanics provides beyond those utilized by computers relying on classical physics. 

As an indication of how reality, by which we mean the state of a physical system, 
is represented in quantum mechanics, consider first a classical mechanical system. 
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At any given instant of time, such a system will possess a number of attributes that 
can be measured such as its position, its momentum, angular momentum and energy. 
The list of numbers needed to specify the state of the system can be shorter than the 
complete list of attributes. For example, for a point particle the energy and angular 
momentum can be computed if the position and momentum are known. Thus, the 
particle’s position and momentum provide a complete specification of its state. In 
contrast to classical systems, in which the attributes can take on any real number 
value, in quantum mechanical systems, a measurement of an attribute often will 
only yield one of a discrete list of values: that is, the attribute is quantized. For 
example, for an electron in a hydrogen atom, the total angular momentum J and the 
z-component of the total angular momentum J;, if measured, take on the discrete 
values Ay j (j + 1) and hm, respectively, where ñ is Planck’s constant, j is a half 
integer greater than zero, that is, j € {1/2, 3/2, 5/2,...}, and m; is a half integer in 
the range — j < m; < j.Given j and m j, the energy of the electron can be computed. 
Indeed, j and m; provide a complete specification of the state of the electron for 
which j and mj; have been measured and this state is often denoted by |j, mj}. Such 
states, however, do not exhaust the states that the electron can be in. The electron 
can be in any state |y) of the form 


ie) j 
Iv) = J J. aml mj). (1) 


j=1/2m;=-j 


where the & jm are complex numbers subject to the constraint 


o0 j 
X J ljm PEL 2) 
J 


j=l/2mj=-j 


Note that the state |y) can be represented as a column vector listing all the œj, m,: 


1 /2,-1/2 
01 /2.1/2 
013 /2,-3/2 
083/2,-1/2 
IW) =] znin |- (3) 
0¢3/2,3/2 


If one performs a measurement of J and J, on this state, the probability of obtaining 
the quantum numbers j and m; from the measurement is |œ Pe The measure- 
ment of J and J, must yield some value for j and mj. Hence, Eq. (2) is simply 
the statement that the sum of the probabilities of measurement outcomes over all 
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possible measurement outcomes must be 1. It is tempting to interpret |y} of Eq. (1) 
as representing a statistical ensemble for which |& jm; | is the fraction of systems 
in the state |j, mj), however, with more than one nonzero gj m,, states having the 
from Eq. (1) can give rise to observable interference effects if the x-component Jy or 
y-component J, of the angular momentum is measured, thereby making an ensemble 
interpretation untenable. 

Having introduced quantum mechanical state vectors via the example of an elec- 
tron in a hydrogen atom, the discussion will now proceed on a more general and 
abstract level. The state |W) of a quantum mechanical system consists of an N- 
dimensional vector where N could be oo. The state can be represented by a column 


vector 
a 
a2 


W=] (4) 


an 


where the œ; are complex functions of time. The Hermitian adjoint (the matrix that 
results when one interchanges rows and columns of a matrix and takes the complex 
conjugate of the matrix elements) of |y} is given by 


(y| = [ap ay > + on]. (5) 


To provide quantum mechanics with a probability interpretation, a norm of the state 
vector |y) is introduced and denoted by (y|y) and defined by the matrix product 


a) 
a2 


N 
(Wily) = [oša oN) © | => lant. (6) 


m=1 
an 


The probability interpretation is imposed by requiring the state vector to have unit 
norm (|) = 1. Then |@m|? is the probability of finding the system in the state 
represented by the mth position of the state vector. 

The dynamics of a quantum mechanical system is governed by the Schédinger 
equation 


PE ey 
ih le) = Hv), ) 


where f is the time and H is the Hamiltonian. The Hamiltonian is an N x N array. 
The values for the elements of this array depend on the system under consideration 
and are ultimately determined by experiment, though procedures, such as canoni- 
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cal quantization, are available that enable one to generate the Hamiltonian from a 
classical mechanical description of the system. The Hamiltonian is required to be 
Hermitian, that is, if the rows and columns are interchanged and the complex con- 
jugate of all the matrix elements is taken, one winds up with the same matrix. This 
ensures that the Hamiltonian only has real eigenvalues, which turn out to be the 
energy eigenvalues of the system. 

Equation (7) can be formally integrated to yield 


ly (t)) = UC, to)|W (to)) (8) 


U(t, to) = exp (-;/ Har) ; (9) 


Since H is Hermitian, the Hermitian conjugate of U (t, tọ) is 


UÏ (t, to) = exp G f Har) . (10) 


It then follows that Ut (t, fo) is unitary, that is 


where 


U'(t, tU (t, to) = U(t, t)U(t, to)’ = 1. (11) 


Thus, governed by the Schrödinger equation a system undergoes unitary time evolu- 
tion. An important consequence of this is that the norm of |y} does not change with 
time, that is, probability is conserved in the sense that the sum of the probabilities of 
outcomes over all possible outcomes is 1, as required for a consistent probabilistic 
interpretation. 


3 Quantum Computers 


Having outlined the mathematical structure of reality, an indication is now provided 
of how the power of quantum computation arises. Apart from unitarity, quantum 
mechanics places no further restrictions on the unitary transformations that can be 
physically realized. Of course, the challenge is to find or engineer physical systems 
that can perform the unitary transformations that carry out the desired computation. 
Here we take for granted that for any desired unitary transformation a physical system 
can be engineered that carries out that unitary transformation. 

It is convenient to introduce some further notation. The state vector Eq. (4) can 
be written as 


N 
Iv) = do anim), (12) 


m=1 
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where the |m) are time-independent unit basis vectors satisfying the orthonormality 
condition 
(m|m') = 8mm! (13) 


where ôm, m is the Kronecker delta function. The Kronecker delta function is zero 
when m Æ m’ and one when m = m. 

Consider now a two-state system where one basis state has been arbitrarily chosen 
to represent a logical 0, whereas the orthogonal basis state is chosen to represent a 
logical 1. Denoting these states by |0) and |1), respectively, an arbitrary state of the 
system is then given by 


IY) = a0) + a111) = a (14) 
This |y} could be the state of a binary memory register. A classical binary memory 
register can be in only one of two mutually exclusive states, a logical 0 or a logical 1. In 
contrast, a quantum mechanical binary memory register can be in any state of the form 
of Eq. (14). In the general case, the contents of the binary memory register must be 
specified by two complex numbers, œo and a with the restriction |r|? + lay [7 =I, 
Although |æo|? and |a;|? are the probabilities of finding the register content to be a 
logical 0 or a logical 1 if one looks at the contents, this does not represent a statistical 
mixture in which the memory register is in one state or the other. Otherwise a single 
number, læol?, would be sufficient to describe the state of the system rather than 
two complex numbers with a constraint on the sum of their norms. The information 
encoded by the state Eq. (14) is referred to as a qubit. Its logical value is specified 
by the complex numbers a, and a2. 
A general unitary transformation of the state Eq. (14) has the form 


el-ele] as 


U= be eal (16) 


where 


In general, the um,n are complex numbers subject to the constraint that U be unitary, 
that is, UU = UU* = 1. Here a quantum mechanical logic gate performing such 
a transformation is referred to as a basis change gate. Note that this gate is a one 
input and one output gate and consequently could be regarded as a generalization of 
a classical NOT gate. A special case of the basis change gate is the Hadamard gate, 


which is given by 
U "Al (17) 
a 21-11" 
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This gate frequently shows up in quantum computing circuits. Suppose that initially 
the memory register is in the logical 0 state |0}, then, subjected to a physical system 
that transforms the memory contents according to Eq. (17), the final state of the 
memory register will be |y) = (1/2) (10) + |1)). A basis change gate thus provides 
a means for putting a memory element into a superposition state when initially it is 
in the logical 0 or logical 1 state. Another special case of the one input gate Eq. (15) 


is the phase gate: 
ef 0 
Ur =| 0 ea (18) 


This gate transforms the phases of the elements of a superposition state. For example, 
if the initial state of the memory element is given by |W (fo)) = C1/ /2)(|0) + |1)), a 
physical system performing the operation Eq. (15) on the memory element will put 
it in the state |y (41)) = (1/2) (e'% |0) + et% |1)). 


3.1 The Controlled NOT Gate 


It is noted that the phenomenon of classical wave interference is sufficient to real- 
ize any unitary transformation. For example, any unitary transformation on an N 
dimensional vector can be realized by an array of N(N — 1)/2 optical beam splitters 
[30]. However, for certain unitary transformations, quantum mechanics provides a 
means to greatly reduce the number of parts needed to implement the unitary trans- 
formation. To indicate how this works the controlled not (CNOT) gate will now be 
considered. 

The CNOT gate is a two-input two-output gate that is a generalization of a classical 
exclusive or (XOR) operation. Since this is a two-input gate, two interacting two- 
level systems are required to implement the gate, the control system and the target 
system. Let the Boolean bases states of the control system be denoted by |0)c and 
|1)c and the Boolean bases states of the target system be denoted by |0)7 and |1)r. 
The state space for the complete system is the outer product of the state space for the 
two subsystems. Hence, the basis states for the complete system can be written as 


|1) = |0)c|0)r = |0, 0) 
|2) = |0)cl1)r = |1, 0) 
13) = |1)cl0)r = |0, 1) 
4) = (DelDr = |1, 1). (19) 


In the rightmost equalities the states have been written in the form |x, y) where x is 
the Boolean value of the control basis state and y is the Boolean value of the target 
basis state. In this basis, the unitary transformation performed by the CNOT gate has 
the matrix representation 
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4b L 


Y L Yy 


Fig. 1 Symbol for a CNOT gate. The input and output line labeled x is the control line. The target 
input is y. The target output is the XOR of x and y 


1000 
0100 
Ucnot = 0001]: (20) 


0010 


On the Boolean basis states this CNOT operation performs the following transfor- 
mations: 


|0, 0) — |0, 0) 
I0, 1) — |0, 1) 
|1, 0) — |1, 1) 
|1, 1) > |1, 0) (21) 


From this it is apparent that the control state x remains unchanged, whereas the target 
state y is transformed to x @ y, the XOR operation. The symbol for a CNOT gate in 
quantum computer diagrams is given in Fig. 1. 

In general, the initial state of our two two-state system has the form 


0,0 
(04 

lw (to)) = a, = 0190/0, 0) + 0,110, 1) + ar o[1,0) tonal, 1), (22) 
œŒ1,1 


where the state vector has been written in two different forms, that of a column vector 
and that in which basis vectors in ket notation are employed. The result of operation 
of the CNOT gate on this state is obtained by multiplying the column vector form of 
the state by the matrix Eq. (20) representing the unitary transformation performed 
by the CNOT gate, as Eq. (15) indicates. The resulting state is 


1000 &0,0 &0,0 

= 0 1 0 0 &0,1 GS Q0,1 

ly (t1)) a 0001 01,0 = CAR 
0 0 1 0 Q11 10 


œ0,0|0, 0) + a,110, 1) + œ1,0|1, 1) + œ1,1|1, 0). (23) 
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Comparing these last two equations one sees that the CNOT gate simultaneously 
performs the correct logical operation on each Boolean component of the input state. 
This is an example of quantum parallelism. 


3.2 Quantum Parallelism 


To generalize the discussion of CNOT gate operation, consider the case of a memory 
register consisting of N two-state systems. For a classical memory register, each bit 
element is either in a logical 0 or logical 1 state. As a consequence, the register can 
store only one N bit binary number at a time. Quantum mechanically the register 
can be in any state of the form 


1 1 1 
We) = JO se Ye amm.my|M, m, 25 25) my), (24) 


mı=0 m2=0 my=0 


where, as indicated by the subscripts and superscripts of the sums, m; € {0, 1}. Each 
state label mı, m2, ..., my is, thus, a binary number and the sum is over all possible 
N bit binary numbers. Equation (24) is a generalization of Eq. (22). Thus, a quantum 
mechanical memory register consisting of N two-level systems behaves as if it is 
simultaneously storing 2% complex numbers subject to the constraint that the sum 
of the norm-squares of these complex numbers is 1. Singling out memory elements 
i and a j for operation on by a CNOT gate, Eq. (24) can be written as 


Io) = ÈO Am eta, .04..0;...mylPM1, M2, ...0;...0;... my) 


{mnln¢{i,j}} 

+ ) Am) ,mg,...0;...1;...mn mı, M2, ... 0; tee 1;...my) 
{m, |n¢{i, jh 

+ ) Am, ,m,...1;...0;...my mM], mz,...1;...0;...my) 
{mn |n¢{i,j}} 

F ) Am) ,mo,...1j..1j..my M,M,...1;...1)...my), 
{m,|n¢{i,j}} 


(25) 


where the subscript on the sums indicates summation over all m, as in Eq. (24), 
excluding the sum over m; and mj. Upon operation by the CNOT gate, the contents 
of the memory register, where i is the control qubit and j is the target qubit, are 
changed to 
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ly (ti)) = `X Olin mg,...0;...0}...my IM mz,...0;...0;...my) 


{m,|n€¢{i, j} 

+ > Om) m,...0;...1)-..my M1, m2, 0; selypes My) 
{m |ng{i j} 

= 5 QÆm,,mz,.li1;my M1, M2, -.. l; ... 1j... my) 
{m |ng{i j} 

+ 5 Qm ,m,...1;...1;..my VMs mz,...1;...0;...my). 
{my |n¢{i, j} 


(26) 


This equation is a generalization of Eq. (23). Thus, the CNOT gate simultaneously 
performs the correct Boolean operation on each of the 2" Boolean basis states of the 
superposition. 

Not all computations can take advantage of this quantum parallelism. The iden- 
tification of computational tasks of practical interest that can benefit from quantum 
speedup has been slow in coming but include important problems such as factoring 
large numbers and doing database searches. The quantum computing algorithms for 
these two problems were discovered by Shor [31] and Grover [32] respectively. 

The CNOT gate, the Hadamard gate, together with the phase gates, form a com- 
plete set of gates for universal quantum computation [33]. Thus, this set of gates 
plays a role similar to that of the NAND gate of classical electronic circuit design, 
the gate with which any Boolean function can be implemented. 


4 The Frenkel Exciton Hamiltonian 


Having displayed a complete set of gates that enable universal quantum computation, 
we now work toward showing how these gates might be realized by dye aggregates. 
To this end, a Hamiltonian governing the dynamics of excitons is now introduced, the 
Frenkel exciton Hamiltonian. This is a phenomenological or reduced Hamiltonian, in 
that it contains parameters that must be determined by experiment or by calculation 
methods, such as time-dependent density functional theory, that are closer to first 
principles calculations. Here interaction of excitons with molecular vibrations is 
neglected. How these may be included is discussed in Sect. 16. 

The Frenkel exciton Hamiltonian is given by Abramavicius et al. [34], Renger et 
al. [35] 


N 
H = } Ep Bh Bn + 9, Jonn (Bh Bn + By Bm) 
m=1 (m,n) 


N 
1 kag P 
+ 2 5 Am Bi, B, Bm Bm Rg 5 K m,n) Bm B, Bn Bm. (27) 


m=1 (m,n) 
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Here the summation index (m,n) denotes summation over all distinct pairs dye 
molecules, and N is the total number of dyes in the aggregate. 

E‘, is the transition energy from the ground state to the first optically allowed 
excited state of molecule m. Jon,n) is the exciton exchange energy, which arises from 
the Coulomb interaction between the transition charge densities [34] of dye pair m 
and n. Am is the anharmonicity parameter [35] that quantifies the energy cost with 
having two excitons occupy the same dye. It can be understood as follows. Let So 
denote the ground state of the dye. The lowest optically allowed excited state is 
denoted by Sı and is the one-exciton state of the dye. The transition energy between 
these two states is £f. Let S, denote the excited state of the dye having the allowed 
optical transition from the state Sı whose energy lies closest to 2E*,. Its energy can 
be written as 2E = + Am, where A,, accounts for the energy mismatch. Because the 
energy of this state is approximately twice that of the state S,, it can be regarded 
as the state for which two excitons reside on the dye. The anharmonicity parameter 
thus provides a simple means to account for the existence of dye energy levels above 
the first excited state that play a role when more than one exciton is present in 
the aggregate. Kim») is the exciton—exciton interaction energy between an exciton 
residing on dye m and an exciton residing on dye n. This interaction results from 
the difference in the charge density of a dye molecule when it is in its excited state 
compared to when it is in its ground state [34]. This results in a difference in the total 
Coulomb energy of the aggregate when two excitons reside on neighboring dyes or 
are farther apart. 

These phenomenological parameters are amenable to engineering. The value of 
Eç, depends on the dye structure and can be varied by changing substituents on the 
dye. Jon,n) and K (m,n) depend on the structure of dyes m and n and how the dyes are 
positioned and oriented with respect to each other. As indicated in the Appendix, 
dye pairs exist for which Jon,n) and Kim») can be adjusted independently of each 
other by reorienting the dyes. In what follows it is assumed that dye aggregate 
systems can be engineered in which these phenomenological parameters, for nearest 
neighbor dyes pairs, can take on any desired value within the maximal limits of 
these quantities for available dyes. Values of E*, are in the several electron volt (eV) 
range. For dye pairs in close proximity, J¢m,») can be in the 100 meV range [10]. 
On dimensional grounds, one expects that K(m,,) can achieve similar strength. As 
discussed in the Appendix, the strengths of Jim.) and Kon,n) can be estimated from 
dipole approximations. In this approximation, J(,,,,) is proportional to the product of 
the transition dipole moments Um and y, for the two dyes m and n, whereas K (mn) iS 
proportional to the product of the dipole moments Adn and Ad,,. These latter dipoles 
are referred to as excess dipoles in reference [36] and represent the difference between 
the excited-state and ground-state charge densities for the two molecules. jz», and 
Ad,, can both range as high as 16 debye [36-38]. The values of Jon,n) and K (m,n) can 
be larger than the characteristic room-temperature thermal energy kT = 26 meV, 
where kg is the Boltzmann constant and T is the absolute temperature of ~300 K. 
This means that exciton-based quantum gates should function at room temperature, in 
contrast to superconducting device-based quantum computers that require millikelvin 
temperatures to operate. 
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The B* and B,, are not numerical quantities but operators referred to as exciton 
creation operators and exciton annihilation operators, respectively. These are best 
viewed as part of a clever and economical bookkeeping formalism that enables the 
construction of the exciton state space and aids in keeping track of how the state 
vector changes with time. 

At this point it is useful to introduce the notion of an energy eigenstate and an 
energy eigenvalue. By direct substitution it can be shown that states of the form 


Iy) = eP] Ep), (28) 


where |E} is time independent, are solutions to the Schrödinger equation Eq. (1) 
provided the equation 
ElEk) = H| Ex) (29) 


is satisfied. A state satisfying Eq. (29) is said to be an energy eigenstate and Ex is 
said to be its energy eigenvalue. If the state space is N dimensional N orthogonal 
energy eigenstates will exist, enumerated by the integer subscripts k. 

An aggregate system will have a lowest energy state in which all the dye molecules 
are in their ground state. This state is denoted by |0) and is taken to have unit norm 
(0|0) = 1. The annihilation operator B,, has the property that 


Bm|0) = 0. (30) 
Using this equation one immediately obtains 

#H\0) = 0. (31) 
It follows that |0) is an energy eigenstate having energy eigenvalue 0. Hence, the 
Hamiltonian Eq. (27) has been constructed so that the zero of its energy scale matches 

the ground-state energy of the aggregate. 
It is now useful to introduce the notion of a commutator. Given any two operators 

A and B, their commutator is defined by 
[A, B] = AB — BA. (32) 
The exciton creation and annihilation operators satisfy the commutation relations 


[Bm, Bn] = [B}, BI] =0 (33) 


and 
[Bm Bi] = Ôm,n (34) 


where ôm,n is the Kronecker delta function. The Kronecker delta function has the 
value 0 when m Æ n and the value 1 when m = n. 
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A complete set of orthonormal basis vectors can be constructed by applying 
products of creation operators to |0). The general state will have the form 


N Bi yt 
Inj,no,....nv) =|] Bn)” 10), (35) 


The number nn is the number of excitons residing on dye m. 
As the notation for the creation and annihilation operators suggest, B}, is regarded 
as the Hermitian adjoint of B,,,. In addition, the Hermitian adjoint of |0} is denoted 


by (0|. Hence, the Hermitian adjoint of the state vector Eq. (35) is given by 


(Bn) 
Vy! 


N 
(ni, m,... nn] = (Ol [| (36) 
m=1 


where use has been made of the commutation relation Eq. (33), allowing reordering 
of annihilation operators among themselves. Using the commutation relations for 
the creation and annihilation operators, one can show that these states satisfy the 
orthonormality condition 


(n1; n2, -+ -5 ANIA Ahs +5 ny) =] [Sn (37) 


The Hamiltonian and the state space have now been described. This system 
belongs to a class of Hamiltonians referred to as Bose—Hubbard models [20]. It 
has been shown that universal quantum computation can be performed by a many- 
particle quantum walk in such systems [20]. The proof consists of showing how a 
universal set of quantum gates can be implemented in such systems. A similar analy- 
sis will be presented here by exhibiting a set of gates that may be easier to implement 
in dye aggregate systems. 

Before doing so, an analysis of a two-dye aggregate is performed to illustrate how 
computations are carried out with this formalism and to provide some insight into 
the quantum behavior of excitons. 


5 Energy Eigenvalues of a Homodimer Dye Aggregate 
and Davydov Splitting 


Here, as an example of how Frenkel exciton computations are carried out, the energy 
eigenvalues and eigenvectors of a dye aggregate consisting of two identical dyes 
(homodimer) are solved for the case when a dye aggregate contains one exciton. It 
will be shown that as a result of exciton exchange, the absorption spectrum of the 
dimer exhibits peak splitting, a phenomenon referred to as exciton splitting [39] or 
Davydov splitting [40]. 
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Consider the case when the dye aggregate consists of two identical dye molecules. 
The Hamiltonian Eq. (27) then reduces to [41] 


H = E* (BiB, + BİB») + J ( Bİ B> + BiB) 
ee a — 
wa (Bİ BİB, Bı m B} BÌ BB>) + K BÝ Bi B3B}. (38) 


The Hamiltonian of Eq. (27) and, consequently, of Eq. (38) is exciton number con- 
serving. That these cannot change the number of excitons follows from the fact that 
in each term of the Hamiltonian each creation operator is paired with an annihilation 
operator and from the commutation relations for the exciton creation and annihilation 
operators. As a consequence, the energy eigenvalues can be found by working with 
state spaces having a fixed number of excitons. The system ground state |0} is the 
zero exciton example of this as it is an energy eigenstate. 
From Eq. (35), the set of one-exciton states in the site basis is 


B, = [B110 B}|0)} l (39) 


These two states are the states in which the exciton occupies molecule | or molecule 
2, respectively. The set of two-exciton states in the site basis is 


BİBİ tatma BIB] 
B = 1 LIO), Bi BIO), 22 
2 ee 172 


WA |0) | (40) 
The leftmost state of this set is that for which both excitons reside on molecule 1. 
The middle state is that for which one exciton resides on molecule | and the other on 
molecule 2. The rightmost state is that for which both excitons reside on molecule 
2. Note that due to the commutation relation Eq. (33), the state Bie) |0) and the 
state Bİ Bi |0) are the same state. Thus, only one appears in the set. The formalism 
via the commutation relations Eqs. (33) and (34), thus, has the indistinguishability 
of excitons built into it. Indistinguishable particles for which these commutation 
relations apply satisfy Bose statistics and are referred to as Bosons. 

To determine the energy eigenstates and eigenvalues of Hamiltonian Eq. (38) it 
is useful to evaluate the expression B} B, B; |0) and Bi BÝ B, B, B} |0), where r, s and 
t are integer site labels. From the commutation relation Eq. (34) one has 


B,B' — BIB, = 85. (41) 


or 
B, Bi = 55, + B' By. (42) 


One thus has 
Bi B, B10) = Bi [ôs + Bi Bs] 10) = ôs, B} 10) (43) 


r 
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and 
BÌ Bİ B,B,B;|0) = Bi BB, [ôs + Bi B,||0) = 0 (44) 


where in each case the last equality follows from Eq. (30). A consequence of the 
last equation is that terms with the A and K coefficients of Eq. (38) are zero in the 
one-exciton sector of the state space, as one would expect given these terms account 
for exciton—exciton interactions. Using these last two equations one finds 


HB; \0) = E°B;|0) + J BÌ10) (45) 


and . . N 
HB3\0) = J BÏ |0) + E°B3|0). (46) 


The general one-exciton state vector has the form 
IY) = or, Bi 10) + œ2B310). (47) 


See Eqs. (4) and (12). Hence, in matrix form one has 


Hily) = E d ek (48) 


ala] ie] ° 
ObK2 JE OK2 

The solutions to this eigenvalue—eigenvector equation can be solved by brute force 
using general linear algebra techniques. The clever approach, however, is to note 
the Hamiltonian Eq. (38) remains unchanged (is symmetric) under the interchange 
of subscripts 1 and 2 on the creation and annihilation operators, a consequence 
of having chosen the two dyes to be identical. In this case it follows from group 
representation theory of the permutation group that the eigenstates must be symmetric 
or antisymmetric (changes sign) under the interchange of the site subscripts. Hence, 
the eigenstates can be immediately written down: 


Equation (29) becomes 


IS) = = (Bio + 810) = = [i (50) 


and i i j 
= t i = 
a = = (210 - B10) = | 1]. 61 
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where the state labels S$ and A indicate whether the state is symmetric or antisym- 
metric under interchange of the site labels. The corresponding energy eigenvalues 
are then immediately obtained by substitution of the eigenstates into Eq. (49). One 
finds 

Es=E°+J (52) 


and 
E= E-J. (53) 


When the exchange energy J is zero, such as when the dyes are far apart, the energies 
become degenerate with both eigenstates having the energy E°. This is the energy 
of the photon that by absorption in a dye molecule induces a transition from the 
ground state to its lowest optically allowed excited state. When the exchange energy 
is nonzero, the energy of the two eigenstates differs by 2J. The transition from the 
ground state |0) to the excited state |S} is induced by a photon having the energy 
E°? + J. The transition from the ground state to the excited state |A) is induced by 
a photon having the energy E° — J. The absorption spectrum of the dimer will thus 
exhibit two peaks in the absorption spectrum, whereas a single dye molecule would 
exhibit a single absorption peak. This peak splitting is called Davydov splitting and is 
of size 2J . Figure 2 provides an experimental example of Davydov splitting observed 
for two “squaraine-rotaxane” dyes confined to the core of a DNA Holliday junction 
by covalent linkages [42]. One sees that the absorbance peak of this dimer aggregate 
is split into two peaks, one on either side of the monomer (single dye) absorbance 
peak. 

Note that for the energy eigenstates Eqs. (50) and (51), the probability amplitude 
is of equal magnitude for the exciton to reside on dye 1 and dye 2. The exciton 
acts as if it has a simultaneous existence on both dyes. This is referred to as exciton 
delocalization. A classical particle would not be able to do this because it can only 
have one position at a time. 


6 Coherent Exciton Hopping 


The energy eigenstates at any instant of time make a perfectly good basis set with 
which to express a quantum state. This basis set is particularly convenient to work 
with when considering the time evolution of a system. Any state |w(0)) can be 
expressed in the form 


N 
|W(O)) = Do axl Ex), (54) 

k=1 
where | Ez) are the energy eigenstates obtained by solving Eq. (29). Using this as the 


initial condition for the state |y} appearing in the Schrödinger equation Eq. (7) the 
solution to the Schrödinger equation is given by 
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Fig. 2 A dimer dye aggregate whose absorbance spectrum exhibits Davydov splitting. The aggre- 
gate consists of two “squaraine-rotaxane” dyes (SeTau-670 from SETA BioMedicals) confined to 
the core of a DNA Holliday junction by covalent linkages, as shown schematically in the top panel. 
In the bottom panel, the absorbance spectrum of the monomer dye shows a single peak, whereas 
that of the dimer aggregate shows two peaks, one on either side of the monomer absorbance peak. 
Based on absorbance and circular dichroism data it was determined that the dyes make an angle of 
about 85° with respect to each other, a configuration referred to as oblique. Figure panels modified 
are from Barclay et al. [42] 


N 
IWO) = gee). (55) 


k=1 


As an example of the time dependence that a homodimer aggregate may exhibit, 
consider the case when at t = 0 the state vector is given by 
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Substituting Eqs. (50) and (51) into this equation yields 


IY 0)) = (1S) + |A)). (56) 


Iy (0)) = Bi IO). (57) 


The initial state Eq. (56) is the state in which the exciton resides only on dye 1. 
Such a state can be prepared for a dye aggregate in which the two dye molecules 
have a different orientation. In this case, the polarization of a femtosecond laser light 
pulse can be oriented so that it is orthogonal to the dipole component of the transition 
charge density of molecule 2 but not that of dye 1. In this manner, the laser light pulse 
can only induce an optical transition from the ground state to the lowest optically 
allowed excited state in dye 1 [18]. 
As indicated by Eq. (55), the time evolution of this state is given by 


IY) = 3 (eo PEO S) ie PEAS) (58) 


Substitution of Eqs. (50) through (53) into this equation yields 


~ik¢t/h Jt) pt wan (Jt) pt 
lw(t)) =e cos = B}\0) — 2i sin = B,|0) |. (59) 
From this, it is evident that the probability of finding the exciton on dye | as a function 
of time is 
2 (Jt 
P(t) = cos F (60) 


whereas the probability of finding the exciton on dye 2 is given by 
Jt 
P(t) = sin? (=) . (61) 


Hence, at the instances of time t = nx ħ/J, where n is an integer, the exciton resides 
entirely on dye 1; at the instances of time (n + 1/2)zh/ J, the exciton resides entirely 
on dye 2. The exciton thus hops back and forth between the two dyes with a frequency 
of zh/ J. Optical experiments using femtosecond light pulses are able to reveal such 
coherent oscillations. This phenomenon is referred to as exciton coherence. 

Having discussed exciton delocalization and exciton coherence, we now move on 
to exciton devices. 
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7 Exciton Transmission Lines 


For a many-particle quantum walk-based quantum computer, a means is required 
to transport particles from the output of one gate to the input of the next. Here it is 
shown that a dye aggregate consisting of a linear array of molecules can function as 
an exciton transmission line and thus can serve as a wire connecting gates [43]. 
For simplicity, consider an infinite array of identical dye molecules equally spaced 
and the case when only one exciton is present. One can then drop the exciton—exciton 
interaction terms of the Hamiltonian Eq. (27) and the Hamiltonian becomes 


H = E° 2 BİB, +J 2 (Bi, B, + B} B41) (62) 


r=—oo m=— 00 


where, for simplicity, all but nearest neighbor interactions have also been neglected, 
an approximation that is generally satisfactory because Jm,n falls off as the reciprocal 
of the cube of the distance between dyes m and n. 

The system is invariant under translation by the lattice spacing. Group represen- 
tation theory then indicates that the one-exciton energy eigenstates must have the 
Bloch form 


=-= 3 e!*” BT 0), (63) 


r=—oo 


where k is a real number restricted to the range —a < k < x due to the periodic 
nature of the functions e’*”. Applying this state to the Hamiltonian Eq. (62) yields 


H|k) = [E° + 2J cos(k)] |k), (64) 


demonstrating directly that |k) is a eigenstate of the Hamiltonian Eq. (62) having the 
energy eigenvalue 
Eg = E° + 2J cos(k). (65) 


The general one-exciton state for an exciton residing on the dye array has the form 
l T 
V2 -T 

where f(k) is a complex function of k. This is a generalization of Eq. (55) for the 


case when the state label is a continuous variable rather than a member of a discrete 
set. Introducing the frequencies 


IV) = dk f (ke {F#/|k), (66) 


Ex. E° 


F` and wj = (67) 


P 
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the dispersion relation Eq. (65) can be written as 
wk = We + 2, cos(k) (68) 


and Eq. (66), with the aid of Eq. (63), can be put into the form 


1 = 7 —i(wgt—kr kd 
KORE DU fOe (Ak B710). (69) 


a 


In this form, it is evident that the general one-exciton state consists of a wave packet 
of waves with an oscillatory function of time and position along the dye array given 
by e™į(®t—-kr) Thus, the exciton propagates along the transmission line in a wave-like 
manner. 

When f (k) is strongly peaked with a narrow width about a particular k, the concept 
of wave packet velocity (known as group velocity) becomes meaningful and is given 
by 


(70) 
where a is the lattice spacing (the nearest neighbor distance between the dyes). From 
Eq. (68) one obtains 

Vg = —2w,a sin(k). (71) 


The magnitude of the group velocity is greatest when 


2 72 
5 (72) 
and has the value 2 a. This is the speed limit for signals propagating along the 
transmission line. 

Because an exciton wave with definite k has the oscillatory form e 
one sees that at the group velocity maximum the wavelength of the exciton is four 
lattice units long. A quarter wavelength or 2/2 phase shift is present between two 
neighboring dyes at any instant of time. This observation will play a significant role 
in the discussion of exciton-based basis change gates. 


—i(wyt—kr) 
ei 


8 Representation of an Exciton Qubit 


In the design of quantum computers, a decision needs to be made on how information 
is to be encoded in the physical hardware. An obvious choice for excitons undergoing 
a many-particle quantum walk over a dye aggregate would be to let the absence of an 
exciton denote a logical 0 and the presence of an exciton denote a logical 1. Because 
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the exciton can exist in a superposition of being present or absent, this is a qubit. 
This is the representation that Childs et al. [20] chose in their proof that universal 
quantum computation can be implemented in Bose—Hubbard systems. 

Here an alternative encoding will be employed where a qubit is carried by two 
transmission lines, say lines 1 and 2. If the exciton is on line 1 that is to be regarded 
as a logical 0. If it is on line 2 that is to be regarded as a logical 1. This is referred 
to as the “dual-rail” mode of operation. The exciton can be in a superposition state 
for which there is a probability amplitude a, of its being on line 1 and a probability 
amplitude a2 of its being on line 2, so this coding indeed implements a qubit. This 
dual-rail representation of a qubit enables the simplification of gate design but at the 
cost of having twice as many wires (exciton transmission lines) connecting the gates. 


9 Basis Change Gates 


We now consider aggregates that function as quantum gates. These aggregates are 
connected to exciton transmission lines that supply the input signals and deliver the 
output. A general mathematical framework is developed before considering specific 
gates. 

Equation (63) suggests that one introduces the annihilation operator 


B(k) = ikr B, (73) 


1 oo 
ee 
27 r=—0o 
Its Hermitian conjugate is a creation operator which, when operating on |0), produces 
a one-exciton state with wave vector k and frequency œz. It is an energy eigenstate 
with the energy eigenvalue E; = fiw, given by Eq. (65). A consequence of the infinite 


sum in Eq. (73) and the continuous nature of the index k is that these creation and 
annihilation operators satisfy the commutation relations 


[ B(k), B^] =0 (74) 


and 
[B(k), B'(k’)] = d(k —k’), (75) 


where 5(k — k’) is a functional referred to as the Dirac delta function. 

The exciton of state B'(—k)|0) propagates in the opposite direction from the 
exciton of state B*(k)|0). 

We now consider the case of a one-qubit gate. In the dual-rail representation, this 
gate will have two transmission lines carrying the input qubit and two transmission 
lines carrying the output qubit. The situation is depicted in Fig. 3. Because signals 
can propagate both ways on each transmission line, we must consider eight annihila- 
tion operators. We employ the labelings Bz (k) where the superscript œ € {in, out} 
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Bi"(k) + Bi" (k) = 
+ Bi"(—k) 
(k) = 
< Bo“ (—k) = Bg"(—k) 


Fig. 3. A general one qubit gate G connected to four exciton transmission lines. The transmission 
lines consist of arrays of dyes that are here represented by circles placed along a line 


indicates whether the exciton is propagating “in” toward the gate G or “out” away 
from the gate. The subscript £ € {0, 1} indicates whether the exciton resides on the 
logical 0 or the logical 1 transmission line of the qubit, and the argument k indicates 
the value of the wave vector. 

Consider the case when all the transmission lines have the same E° and J. The 
exciton associated with the exciton creation operator B? (k), after arriving at the 
gate G, will exit one of the four outputs Bj" (k), B? (k), Bo" (—k) and B?™(—k). 
Because energy is conserved and the transmission lines are identical, if the incoming 
exciton has wave vector k, the outgoing exciton can only have the wave vector k 
or —k. Hence, one need only consider the annihilation operators shown in Fig. 3. 
One has a similar situation with excitons entering the other input ports. The relation 
between the input and output annihilation operators is given by 


BP (k) Sir S12 S13 S14 BP (k) 
BE! | _ | Sar S2 So3 S24 By (k) (76) 
Bo"(—k) | ~~ | S31 S32 S33 S34 BP (—k) 
Bo" (—k) S41 S42 S43 S44 Bg (-k) 


The matrix elements S,,, are, in general, complex numbers. The square matrix 
containing the Sm,» is referred to as a scattering matrix. Let this matrix be denoted by 
S. Because the incoming signals are independent, the “in” creation and annihilation 
operators must satisfy commutation relations similar to those of Eqs. (71) and (72) 


[By (K), By(k’)] = 0 (77) 


and E 
[Bz (k), By (k’)] = 5, p'ê (k — k^). (78) 


Similarly the outgoing singles are linearly independent of each other and conse- 
quently the “out” creation and annihilation operators also satisfy commutation rela- 
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tions of the form Eqs. (77) and (78). For all these commutation relations to be satisfied, 
the matrix S must be unitary. That S be unitary is also required by the conservation 
of energy and is a manifestation of the unitary evolution imposed by the Schrédinger 
equation. 

When the device simply consists of two parallel transmission lines that are suf- 
ficiently far apart that exciton hopping from one transmission line to the other does 
not occur there is no scattering and the S matrix is diagonal 


1000 
0100 
Sova | om) 


0001 


When two exciton transmission lines are brought sufficiently close so that the oscilla- 
tory Coulomb interaction can occur between between dye pairs, with one dye located 
on each transmission line, an exciton can hop from one transmission line to the other. 
Thereby nondiagonal entries of the S array become nonzero. 

To make a specific quantum gate, such as a Hadamard gate, the challenge is to 
engineer a dye aggregate that produces the desired matrix elements Sn, n. For this, 
the electrical engineering literature on distributed element circuits serves as a useful 
guide. These radio frequency and microwave circuits are based on wave interference. 
In these circuits, transmission line segments a quarter of a wavelength long play a 
prominent role. One such device is shown in Fig. 4a. It is an example of a branch line 
coupler [44]. Signals entering port 1 of this coupler exit ports 2 and 3, and no signal 
exits port 4. For the case shown, the transmission line segments that are narrow have 
a transmission line impedance of Zo, whereas the two transmission line segments 
that are wide have a transmission line impedance of Zo/./2. With these impedance 
values, the signal entering port | is split evenly between ports 2 and 3, that is, the 
device functions as a 50/50 beam splitter. Because distributed element circuits rely on 
wave interference they generally function well over a limited range of wavelengths 
(or frequencies) centered about a midband wavelength (or frequency) determined by 
the length of the transmission line segments. 

The device shown in Fig. 4b is a direct translation of the device of Fig. 4a into 
an exciton device. Because there is a quarter wavelength shift in the phase between 
neighboring dyes at the band center k = 1/2, the distance between the dyes in effect 
serves as a quarter wavelength section of transmission line. The energy parameter 
J plays the role of the reciprocal of impedance. Because the value of J depends on 
the spacing between dyes, the dye spacing can be adjusted to yield the desired value. 
In this case, the coupling between all nearest neighbor dyes is J except for the two 
having the value 2, as indicated in the figure by arrows. 

At band center, the scattering matrix Eq. (73) for this device is 
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Zo V2 


Fig. 4 A branch line coupler as implemented in radio and microwave frequency electronics (a) 
and as an exciton device implemented using dye molecules (b). For the values of the impedance 
or exciton exchange energies shown, these devices act as 50/50 power dividers. The exciton device 
(b) functions as a basis change gate 


i100 
1 | 1100 

= ie OO m1)" (30) 
001i 


From the zeros in this matrix, it is evident that signals entering ports B} and Be 
are only delivered to ports B?™ and Bj". Hence, the device can be regarded as a 
one-qubit gate in which the input enters the left side of the device and the output 
exits the right side as shown in Fig. 4b, where the logical 1 and logical 0 lines have 
been indicated. The transformation performed on the annihilation operators is 


Bout 1 i] BM 
a ee eae (81) 
Bo V2L1 iS L Bo 
This transformation can be inverted to express the annihilation operators of the incom- 
ing signals in terms of the annihilation operators of the outgoing signals. 
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Bie) 1 f-i 1 ] fae 
[lal || a | cs 
Consider the case when the state of the system | ys) is that for which the incoming 


signal is a logical 1: m 
IVs) = B7" 10). (83) 


From Eq. (82), one has 
in __ 1 out _ ; pout 
BP = — (B?™" — i B8") . (84) 


v2 


Substituting this into Eq. (83) and remembering that taking the Hermitian adjoint 
involves complex conjugation one obtains 


o 
2 


Hence, this system state is one for which the exciton exits the gate as a superposition 
state in which the probability amplitude that the exciton is on the logical 1 line is 
1/./2 and the probability amplitude that the exciton is on the logical 0 line is i//2. 

The gate of Fig. 4b is capable of exhibiting the ideal performance of Eq. (81) at the 
transmission line band center k = +r /2; however, its performance degrades away 
from band center. But the degradation is graceful in that there is a finite bandwidth 
over which the device functions satisfactorily for any specified tolerance level. A full 
analysis of this gate is presented in [43]. Greater bandwidth than that exhibited by 
the gates of Fig.4 can be achieved with more complex gate designs, the theory of 
which is well developed for distributed element circuits. 


Is) = =e (B010) +8010). (85) 


10 Phase Gates 


Phase shifts can be implemented as propagation delays. Figure 5 illustrates two trans- 
mission lines along which a qubit in the dual-rail representation propagates. For the 
phase gate shown in Fig.5, one line has been made one dye longer than the other. 
For signals propagating at the midband of the transmission line, this extra length 
induces a quarter wavelength propagation delay (a 2/2 phase shift) with respect to 
the shorter transmission line. The transformation performed by this gate is given by 


Be E eiz/2 0 B? 
a “ 
Propagation delays can be induced by other means. From Eq. (63), it is evident 


that the phase factor between neighboring dyes is e!*, that is, the phase shift is K. 
Taking the energy Ex of the exciton (or the carrier frequency wx of the signal) to 
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Fig. 5 A phase gate consisting of two transmission lines. The phase shift results from propagation 
a delay of the signal traveling over the longer transmission line. For the case shown, where one 
transmission line is one dye longer than the other, the phase shift is 2/2 at the transmission line 
midband 


be the controlling variable, the value of k is obtained by solving Eq. (65). One sees 
that it depends on E° and J. Hence, phase shifts can be induced by having E° or J 
differ in sections of one transmission line by employing different dyes (changes E° 
and J) or by changing the spacing between neighboring dyes (changes J). We thus 
posit that any phase gate of the form of Eq. (18) can be engineered. 

It is now shown how the basis change gate exhibited in Eq. (81), 


TERANE, 87 
a A til’ (87) 


can be converted into a Hadamard gate by sandwiching it between two phase gates 
of the form 


e7i7/4 0 
Up = | 0 ez: (88) 
Carrying out the matrix multiplication 


Uy = UpUgUp (89) 


one finds that Uy is the Hadamard gate Eq. (17). 

An alternative means of implementing a Hadamard gate would be to translate a 
hybrid ring coupler, also called a rat-race coupler [44], (a distributed element circuit 
device) into an exciton device as was done for the branch line coupler of Fig. 4. 


11 An Exciton Interferometer 


As an illustration of how single qubit gates can be composed to produce new single 
qubit gates, exciton interferometers will now be discussed. This will also lay the 
foundation for a discussion of the CNOT gate. An exciton interferometer can be 
constructed as a phase gate sandwiched between two basis change gates. As an 
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Fig. 6 An exciton interferometer. a A schematic representation of the device as a cascade of a 
branch line coupler gate, a phase gate and a second branch line coupler. b The physical layout of the 
dyes forming an interferometer. The upper transmission line of the phase gate has been shaded to 
indicate that its propagation delay (phase shift) may be different from that of the lower transmission 
line 


example, consider the phase gate of Eq. (18) sandwiched between two basis change 
gates given by Eq. (87). The configuration is illustrated schematically in Fig. 6a as a 
cascade of gates, and the physical layout of the dyes is shown in Fig. 6b. 

The overall transformation is given by 


Ur = UgUpUZ (90) 


Carrying out the matrix multiplication yields 


_ .i(b1-+¢2)/2 | — inflo — $2)/2] cos[(b1 — ¢2)/2] 
eae | cos[($ı — $2)/21 sin (bi — i S 


From this, it is evident that how an exciton entering one port of the interferometer 
is distributed among the output ports depends sinusoidally on the phase difference 
pı — $2 of the phases of the phase gate. This composition of gates thus functions 
as an interferometer that is sensitive to the phase difference between the two arms 
(transmission lines) internal to the interferometer. 

To simplify the discussion, consider the case when @z has the fixed value x. Then 
Eq. (91) reduces to 


_ ai cos(@/2) sin(p/2) 
Ur(o) = = | sin($/2) — cos(ġ nl i (2) 


where we have set ¢; = @¢. 


DNA Assembly of Dye Aggregates—A Possible Path to Quantum Computing 153 


When ¢ = 0, this matrix reduces to 


U(0) = A . (93) 


In this case, an exciton entering on the logical 1 line exits on the logical 1 line 
and an exciton entering on the logical 0 line exits on the logical 0 line. Hence, for 
Boolean inputs, the Boolean value remains unchanged as the qubit passes through 
the interferometer with the setting @ = 0. 

When ¢ = x, Eq. (92) reduces to 


uim=-i| ta]. (94) 


Now a qubit entering as a logical 1 exits as a logical 0 and a qubit entering as a logical 
O exits as a logical 1. Thus, with the phase ø set at x the interferometer acts as a 
NOT gate. Now, if one had a means to switch @ between 0 and zr, one would have a 
controlled NOT gate. How such a switching element can be constructed is discussed 
next. 


12 A Controlled Phase Shift 


To convert the interferometer of Fig.6 into a controlled gate, a controlled phase 
shifting element is needed in which one exciton controls the phase shift of another. 
This requires an exciton—exciton interaction. Here the changes in the static Coulomb 
interaction between dyes resulting from changes in the charge distribution, when a 
dye transitions from the ground state to the lowest optically allowed excited state, 
are utilized, that is, the K,,,, interactions of Eq. (27) are employed. 

A means is required to enable two excitons to interact in a controlled manner. 
A way to accomplish this is shown in Fig. 7. Shown are two parallel transmission 
lines that differ such that the group velocity of the upper transmission line is less 
than that of the lower transmission line. The dyes of the two transmission lines are 
oriented such that the exchange energy between dyes on separate transmission lines 
is zero. This prevents the transitioning of an exciton from one transmission line 
to the other. As discussed in the Appendix, since the exciton—exciton interaction 
energy arises from a different mechanism than of the exchange energy, the inter- 
transmission line exciton—exciton interaction energy need not be zero even though 
the exciton exchange energy is. By running the two transmission lines close to each 
other the inter-transmission line strength K of the exciton—exciton interaction can 
be made large. 

For operation, exciton wave packets are introduced on both transmission lines; 
however, the packet on the upper transmission line is introduced first to give it a head 
start as shown in Fig. 7a. The exciton wave packet on the lower transmission line 
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Fig. 7 A controlled phase shifting element. The device consists of two transmission lines which 
have no inter-transmission line J coupling, but inter-transmission line K coupling, enabling an 
exciton on one transmission line to change the phase of an exciton on the other transmission line. 
The dyes of the upper transmission line are shaded to indicate that the signal propagation speed is 
slower on that transmission line. That enables an exciton wave packet of the lower transmission 
line to overtake an exciton wave packet of the upper transmission line, as shown at successive time 
snapshots (a), (b) and (c). Thereby, the two excitons are ensured to interact regardless of where 
each resides in its wave packet 


catches up with the wave packet of the upper transmission line as shown in Fig. 7b 
and then surpasses it as shown in Fig. 7c. Where the excitons are located in each wave 
packet is unknown, but, because one wave packet completely overtakes the other, the 
two excitons are guaranteed to interact. The interaction energy is short range but has 
the value K when the two excitons are directly across from each other. The phase 
winding induced by the interaction can be estimated using e~!*"'/", where ty is the 
time interval over which the excitons interact. Let J; and Jz denote the strength of 
the hopping interaction for the upper and lower transmission line, respectively, then 
from the expression for the group velocity at midband Eq. (71) and from Eq. (67), 
the magnitude of the group velocity difference is 


ya, (95) 


Because the excitons strongly interact only when they are within a lattice unit a of 
each other, an estimate of the interaction time is 


oO a h 
[Av] 2|J2— Jil 


ty (96) 
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-iK/2|— 


Hence, the phase winding is e “il, that is, the phase accumulated during the 


interaction is 
K 


=a 


(97) 


The value of ¢; can be engineered through choice of K, J; or J2. Let |Win) denote 
the state of the system before the interaction and | Wout) the state of the system after 
the interaction. The relation between these two states is 


(Vou) =e Vial: (98) 


Because the incoming and outgoing states are both two-exciton states and the exci- 
tons cannot transition from one transmission line to the other, the exciton—exciton 
interaction can be expressed in terms of the incoming and outgoing annihilation 
operators by 

By B9" = e% BP BP., (99) 


13 A CNOT Gate 


Here we consider the CNOT gate shown in Fig. 8. This device is implemented by 
adding to the interferometer circuit of Fig.6 two more transmission lines that carry 
the control qubit. The lower of these two transmission lines comes in close proximity 
to the transmission line of the upper arm of the interferometer to form the controlled 
phase shifter discussed in Sect. 12. Here Am, Bm, Cm and Dm denote the annihilation 
operators for the right propagating exciton modes at various points in the device. 
They also serve as position markers within the device. The input exciton modes are 
denoted by ain and the output exciton modes are denoted by D°". The controlled 
phase shifter consists of the parallel transmission line segments B2—-C2 and B3- 
C3. 

The core of the CNOT gate is analyzed first. Because a dual-rail representation 
is employed, when the control qubit and target qubit enter the device, two excitons 
are present in device, one residing in the control transmission lines and the other 
residing in the interferometer. Hence, to analyze the performance of the core of 
this device, one can restrict the analysis to the state space spanned by the basis 
set: E a. 

Bcr = {[4}A510), AtAt|O), ALAL/O), ASAjI0)} (100) 


Expressed in terms of the B* creation operators, the A‘, creation operators are 
given by 
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Fig. 8 A controlled NOT gate (CNOT) consisting of an interferometer with one arm coupled to one 
transmission line of the control qubit transmission line pair to form a controlled phase shift element. 
The interferometer consists of a branch line coupler B, followed by the phase shifting elements 
P, followed in turn by a second branch line coupler. Note that the input and output transmission 
line 3 has been shortened by one dye relative to the other input and output transmission lines. This 
implements propagation delays that put the scattering matrix of the device in standard form for that 
of a CNOT gate 


Ai = Bi (101) 

Al = Bi (102) 
a! ee ae 

Al= = (-is} + BÍ) (103) 
P ae oe 

Al= (B; - iB}) (104) 


The first two equations express propagation along the control qubit transmission 
lines. The second two equations express the basis change transformation performed 
by the first branch line coupler, see Eq. (82). With relationships Eq. (101) through 
(104), one obtains 


ajai = -5 (-is}B} + Bi B}) 
ae e a 
AAi = (8183 — iB} B}) 
7 ee ae ee 
ajA = (—iB} B} + B}B}) 
he oh 1 on ie pad 
AAi = (BiB; — iB} B}). (105) 
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The relationships between B}, and CÌ, are now established. Because excitons on 
transmission lines 1 and 4 do not interact with excitons on the other transmission 
lines in the region P, one has 

Bi=ci (106) 


and 
Bİ =e'*C} (107) 


where, in writing the last equation, allowance has been made for a fixed propagation 
delay characterized by the fixed phase r that can be engineered to have a desired 
value. When no exciton exists on transmission line 3, the exciton on transmission 
line 2 does not interact with excitons on the other transmission lines. In this case 


Bi = CŒ}. (108) 
One thus has the relations 
BÍ B} = CİCİ (109) 
Bİ Bİ = et" cic} (110) 
BiB] = e” cic]. (111) 


When an exciton exists on each of transmission lines 2 and 3, the two excitons 
interact, and from Eq. (99) one has 


Bi Bi = é” cic. (112) 


The equations of (105) now yield 


AAI Z A =C C + ide Ci ct 
1443 7 1C1iC3 TE 1&4 
Atat = —_(ctct -ie#ctct 
1444 aI te 1&4 
ajaj = (—ie ch} +e*c}c}) 
eee 1 : a i mar 
Atal = T (e* cfc} = icici). (113) 


The mode transformations between C} and DÌ, are similar to those of Eqs. (101) 


through (104) and will not be presented here. One finds 
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Atal = ; [(e’** — 1) Dip} — i (e'* + 1) DÌ Di | 

Alai = ; [-i (c'* + 1) Di Dt — (e'* — 1) pipi] 

Atal = ; [('** — e'*) DÌ D$ — i (e'* +e”) D} pi] 

Atal = ; |- (c' + e'#) Di Dt — (e'®F — e'#") D} ni| -4 


Now the fixed phase on transmission line 4 is chosen to be ġr =x and the 
interaction-induced phase shift, Eq. (97), is chosen to be ¢; = x . The transformation 
(114) then becomes 


Aj A3|0) = me I0) 

A} A;10) = D; D410) 

A}A‘|0) = i DÌ D}|0) 

A}A‘|0) = i DÅ D}|0). (115) 


By identifying an exciton on transmission lines 1 and 3 to correspond to a Boolean 
0 and an exciton on transmission lines 2 and 4 to correspond to a Boolean 1, the 
scattering matrix corresponding to the basis transformation Eq. (115) is given by 


1 0 
0 


ooo 


(116) 


~. 


0 
1 
0 
0-i 0 


ooo! 


Comparing this with Eq. (20) one sees that this transformation is not quite the standard 
CNOT gate transformation, but it does exhibit CNOT functionality. 

The standard CNOT gate can be obtained from this core by introducing a phase 
shift of —z/2 at the transmission line 3 input port and the output port. This is 
implemented in Fig. 8 by making the transmission line segments for input and output 
ports 3 one dye shorter than the corresponding transmission line segments for the 
other input and output ports. That is, one implements the transformations 


Am — ja! 
D; = -i D5" 
Att = Al ifm e {1,2,4} 


D! = D™ ifm e {1,2,4}. (117) 
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With these equations, Eq. (115) yields 


AMANO) _ D? Dto) 
Ai ait 10) = DP 0) 
Ax" A" (0) Z ps De" 0) 
Ay Ay 10) = D3" D3 |0). (118) 
The scattering matrix corresponding to this transformation is 
1000 
0100 
0001)’ (13) 
0010 


which is the standard scattering matrix, Eq. (20), for the CNOT gate [33]. 

This completes the demonstration that a set of gates enabling universal quantum 
computation can be implemented as suitably constructed dye aggregates in which 
the exciton dynamics is governed by the Frenkel Hamiltonian Eq. (27). 


14 Exciton-Based Quantum Computer Architecture 


Having discussed individual gates, the overall architecture of an exciton-based quan- 
tum computer is presented. Figure 9 indicates what the quantum computer might look 
like, how it is initialized and how the result of the computation is delivered as output. 

Between the two vertical dashed lines is the computer circuit itself. In this case, 
the circuit for a Fredkin gate was chosen as a stand-in for a general quantum computer 
circuit [45]. In this case, three qubit lines run parallel to each other from left to right. 
In the dual-rail representation each of these lines consists of two transmission lines. 
The boxes represent single qubit gates. The boxes labeled H are Hadamard gates 
that perform the transformation given by Eq. (17). The boxes labeled T are phase 
gates performing the transformation 


A 


Fig. 9 A schematic indicating the architecture of an exciton-based many-body quantum walk 
quantum computer. See text for details 
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Coupling between qubit wires are through the CNOT gates (Fig. 1). 

The input ports are represented by the antennas at the left end of each qubit line. 
Each of these antennas would couple to a separate electromagnetic field mode. On- 
demand single photon sources would be coupled to these antennas to generate the 
exciton input state. In principle the conversion of a photon into an exciton can be done 
with unit quantum efficiency, but in practice the coupling optics and antenna design 
could be quite challenging. In addition, the single photon sources would need to 
emit short optical pulses all timed to simultaneously initiate each qubit. The excitons 
propagate ballistically (in sync) through the gate network to the output. Note that the 
number of output lines is the same as the number of input lines. This is a consequence 
of unitary evolution. The output can be delivered to photodetectors by antennas at the 
output side (right side of the circuit). These antennas would have the same structure 
as the input antennas and would, with unit quantum efficiency, convert an exciton to a 
photon that would then be detected with a unit quantum efficiency photodetector. The 
output qubit lines (to the right side of the right most vertical dashed line) have been 
draw to different lengths. Each qubit will thus arrive at its detector at a different time 
due to propagation delays. In this manner, the output of the quantum computer is time 
multiplexed so that the output can be read by noting the arrival time of each photon 
at the photodetector. This scheme has the advantage that only a single photodetector 
need be employed, which should simplify the optics that delivers the photons to the 
photodetector. 

The quantum computer, as described, is a special purpose device. For each problem 
to be addressed by quantum computation, a special purpose circuit would be assem- 
bled to carry out that computation. In principle, a general purpose quantum computer 
could be implemented that uses classical switches to reconfigure the circuits. With 
DNA nanotechnology this might be done using strand displacement techniques to 
reconfigure the circuit [46]. 


15 But Isn’t a Quantum Computer Just an Analog 
Computer? 


The question posed in this section heading is often expressed. At first sight, con- 
structing a quantum computer is simply a matter of assembling a physical system 
that implements a desired unitary transformation. Wave interference effects alone 
enable the implementation of an arbitrary unitary transformation, and this can be 
done at the classical level with a collection of optical beam splitters [30]. A quan- 
tum computer differs in two crucial ways from a classical computer. First, quantum 
superposition, in effect, enables the same gate to carryout multiple operations simul- 
taneously, which greatly reduces the parts count or the number of steps needed to 
carry out a computation for those tasks amenable to quantum speedup. Second, if the 


DNA Assembly of Dye Aggregates—A Possible Path to Quantum Computing 161 


gate error rate is below a certain threshold value, quantum error correction can be 
implemented, enabling scalable quantum computation with imperfect gates [47, 48]. 
In contrast, the precision and accuracy of analog computers is generally limited by 
the precision and accuracy of their components. Error correcting quantum computing 
schemes have been devised for which the tolerance for errors is about 1% per gate 
operation, which is still quite demanding. Nevertheless, this is a good reach goal 
to drive technology and the resulting information processing technology is likely 
to have applications even if full-scale quantum computing is not achieved, particu- 
larly because of the compact size (molecular scale) of the gates and the femtosecond 
switching time for gates employing optical transitions. 


16 Molecular Vibrations 


A number of imperfections that occur at the molecular level can give rise to gate 
errors or present challenges that must be overcome to construct viable gates. These 
include the dispersiveness inherent in exciton transmission lines consisting of an 
array of dyes, errors in DNA assembly, and Brownian motion that modulates the 
gate parameters as a function of time. The most serious “imperfections” result from 
the interaction of excitons with molecular vibrations. How this interaction is modeled 
is discussed here. 

A molecule in its ground state has a structural configuration characterized by 
the equilibrium position of the nucleus of each atom. Should the molecule absorb 
a photon and transition to its lowest optically allowed excited state, the position of 
the atomic nuclei at the instance of the transition will still be in the ground-state 
equilibrium configuration, as optical transitions occur on a shorter time scale than 
that required for the atomic nuclei to readjust their positions. As a consequence, the 
atomic nuclei are displaced from their excited state equilibrium position. The system 
responds by converting this potential energy into kinetic energy as the nuclei accel- 
erate toward their excited state equilibrium positions. The result is that the molecule 
undergoes molecular vibrations. These vibrations couple to the environment thereby 
providing a means of energy exchange between the molecule and the environment. 
The result is a scrambling of phases that washes out interference effects. This process 
is referred to as decoherence. This process degrades quantum gate performance, as 
these gates rely on quantum interference effects. 

The exciton—vibration coupling is well modeled by what is often referred to as 
the Frenkel—Holstein Hamiltonian [49, 50]. This Hamiltonian can be written as the 
sum of the Frenkel Hamiltonian Hp of Eq. (27) with a Hamiltonian Hy (the Holstein 
part) characterizing the dynamics of the molecular vibrations and their coupling to 
the excitons 

H = Hp + Hy. (120) 


The Hamiltonian Hy is given by 
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where ama is the annihilation operator for a quantum of vibration for the wth vibration 
mode on molecule m and the Hermitian adjoint ala is the corresponding creation 
operator. These satisfy Bose commutation relations similar to those of the excitons 
Eqs. (33) and (34). E}, , is the energy of a quantum of vibration for the ath vibra- 
tion mode on molecule m, and A,,,q is the displacement between the equilibrium 
position of the ground and excited electronic state for the wth vibration mode on the 
mth molecule. The sums are over all vibration modes of the molecule and over all 
molecules. 

Some insight into the consequences of the exciton—vibration coupling can be 
obtained by considering the Heisenberg equation of motion for the exciton annihi- 
lation operator B for a single molecule. For a single molecule the full Hamiltonian, 


neglecting two-exciton terms, becomes 


H = EBB +Y Exalag + Y ExdgB'B (aq +a}), (122) 
Q a 


where, because we are dealing with only one molecule, the molecule index has been 
suppressed. The Heisenberg equation of motion for B is given by 


— = —[H, B]. (123) 
For the Hamiltonian Eq. (122) this yields 


dB . è y + 
T = —i [e + 2 hha (da + a| B, (124) 


where w is the vibration frequency of the ath vibration mode and is related to the 
energy of a quantum of vibration by EX? = hw”. The operator a« + a; is the position 
coordinate for the vibrational mode «œ. Treating this as a classical variable, Eq. (124) 
can be integrated to yield 


BG) = BO) OO! (125) 


where ¢,(t) is a fluctuating phase arising from the motion of all the molecular 
vibration modes 


p(t) = f XO ogha (aalt) + a} 0) a’. (126) 
0 Qa 
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Fig. 10 An example of a dye whose aggregate absorbance and circular dichroism spectra can 
be well modeled by including only one vibrational mode of the dye. Shown is the cyanine dye 
Cy5 structure (top panel) and the monomer absorbance spectrum (bottom panel). The shoulder at 
600nm in the absorbance spectrum is due to the dominant vibrational mode. The absorbance units 
are 10° M~!cm7! 


This random phase causes the phase winding of B to deviate from that for pure 
sinusoidal oscillation e!@’. This spoils interference effects and gives a width to the 
spectral lines of the dye absorption and emission spectra. 

It is often the case that a coupling between the exciton and vibration is particularly 
strong for a single or small group of vibration modes [51]. An example of this is 
the Cyanine dye Cy5 whose structure and absorbance spectrum [42] are shown in 
Fig. 10. The absorbance spectrum shows an absorbance maximum at about 650 nm. 
This corresponds to the optical transition from the ground electronic state with no 
vibrational quanta to the lowest excited electronic state with no vibrational quanta. 
The shoulder at 600 nm on the short wave-length side of the peak corresponds to the 
optical transition from the ground electronic state with no vibrational quanta to the 
lowest excited electronic state with one vibrational quantum for a dominant vibra- 
tional mode. In the case when one dominant vibration mode occurs the absorbance 
spectrum can be well modeled by including only this one vibrational mode for each 
molecule. In this case the Hamiltonian Eq. (121) reduces to 


N N 
Hy = 0 E)ah am + 9) E, AmB}, Bm (am + 4)) - (127) 


m=1 m=1 
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This Hamiltonian is simple enough that the energy eigenstates and eigenvalues can 
be solved to a high degree of accuracy by numerical methods while still adequately 
accounting for the spectral features in optical absorption spectra of aggregates. 

When an exciton moves from one dye molecule to another, the dye it leaves 
undergoes a transition from the excited state to the ground state while the dye it 
moves to undergoes a transition from the ground state to the excited state. Both of 
these transitions induce molecular vibrations; however, due to conservation of energy 
the exciton cannot endlessly shed vibrations. As a result, the exciton carries a cloud of 
vibrations (or molecular distortions) with it [52]. The exciton becomes a composite 
particle, a bundle of electronic and vibrational energy. This modifies the dispersion 
relation for an exciton propagating along an exciton transmission line. Nevertheless, 
it should still be possible to engineer exciton quantum gates that work in spite of the 
composite nature of the exciton. 

It is also noted that molecular vibrations exhibit longer decoherence times than 
excitons [53]. The exciton hopping interaction provides a means of coupling vibra- 
tions between dyes. It is thus an interesting question as to whether quantum comput- 
ing with dye aggregates could be implemented using quanta of molecular vibrations 
in which the excitons simply provide a means to control the flow of quantum infor- 
mation from dye molecule to dye molecule. 


17 Conclusion 


It has been shown that exciton-based quantum gates can be constructed from dye 
aggregates; and the aggregate configurations for a set of dyes sufficient to enable 
universal quantum computation have been presented. How these could be assembled 
into circuits with inputs and outputs has been discussed. In addition, nonidealities 
resulting from the coupling between excitons and molecular vibrations have been 
treated. 

The question arises: What are the prospects for realizing such devices in practice? 
Exciton delocalization extending over 30-100 dyes has been reported in the literature 
[24]. The transmission lines in the largest gate presented, Fig. 8, are 23 dyes long. This 
suggests that it should be possible to experimentally demonstrate the gates that have 
been presented, as well as small circuits assembled from such gates. Constructing 
quantum computers that would be competitive with conventional computers would 
require that workarounds be devised for a number of nonidealities that dye molecules 
exhibit. 

Although DNA nanotechnology currently offers the most promising means by 
which to assemble dye aggregates into functioning quantum gates and circuits, the 
technology still has limitations that make the construction of these gates challenging. 
It is desirable that the Jm,n and Km,n be as large as possible to enable the gate or 
circuit to complete its operation before coherence is lost. Ideally, the dyes would be 
stacked as closely as possible. That is, one would like the spacing between dyes to 
be comparable to the base stacking distance of DNA. The helical twist of DNA, for 
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which a full turns occurs over roughly 3.4nm, however, makes it difficult to take 
advantage of the base stacking to pack dyes closely together. One would like to 
lay out the gates in a two-dimensional or three-dimensional arrangement. Here the 
2nm diameter of duplex DNA complicates the stacking of dyes close together in the 
direction orthogonal to the DNA helix direction. The size mismatches between the 
DNA structure and the desired spacing between dyes could be ameliorated if dyes 
were covalently linked into transmission-line lengths that could span the distance of a 
helical turn of DNA or the distance between neighboring duplex strands. Covalently 
linked aggregates forming gates would also help. DNA assembly would then be used 
to arrange these larger components into a desired circuit. 

Finally, the quantum computer architecture proposed here is not necessarily opti- 
mum. Childs et al. [20] provide a different set of gates that could be implemented with 
dye aggregates in which the exciton—exciton interaction characterized by the anhar- 
monicity parameter An rather than the exciton—exciton interaction characterized by 
K m,n provides the exciton—exciton interaction needed to implement controlled basis 
change gates. A search for alternative means with which to do information processing 
or quantum computing using dye aggregates could be productive. In this regard, it 
is noted that molecular vibrations exhibit coherence times that are longer than those 
of excitons. This suggests that it may be worth considering how molecular vibration 
quanta might be used to process and store quantum information. 
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Appendix 


Here expressions for Jm,n and Km,n are presented. Both of these quantities rep- 
resent Coulomb interaction energies between pairs of dyes. The first results from 
the Coulomb interaction between the transition charge densities of a pair of dyes, 
whereas the latter arises from differences in the Coulomb interaction between a pair 
of dyes resulting from differences in the ground-state and excited-state charge den- 
sities of the dyes [34]. Expressions for these energies greatly simplify when the 
distance between the dye molecules is much greater than the size of the molecule. 
In this case an approximation can be made in which the charge distribution on a dye 
is represented by a dipole moment. Even when the distance between the dyes is less 
than their lengths, the dipole approximation often provides a factor of two estimate 
for Jm,n and Ky ,n. 

Consider first Jm,n. The dipole component of the transition charge density is 
referred to as the transition dipole. It is a vector quantity whose magnitude for dye 
m is here denoted by um. The dipole vector generally is parallel to the long axis of 
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the dye molecule. In the dipole approximation Jm,n is given by 


UmEn 
Am €€y R3 


m,n 


Jmn = [cos(61) — 3 cos(@2) cos(3)] , (128) 


where Rm,n is the distance between the centers of the two dyes, €o is the permittivity 
of free space and € is the relative dielectric constant of the medium in which the 
dyes reside. 0; is the angle the two dyes make with respect to each other, 62 is the 
angle between dye m and the line between the centers of the two dyes. Similarly, 63 
is the angle between dye n and the line between the centers of the two dyes. Thus 
four quantities Rm,n, 01, 02 and 63 that can be adjusted to fine-tune Jm,n to a desired 
value. 

Consider now K m,n. Let Adm denote the magnitude of the dipole component of the 
difference between the excited-state and ground-state charge densities of molecule 
m. The direction of this dipole also generally lies along the long axis of the dye 
molecule. But if the dye molecule has a bent shape or has a width comparable to 
its length then it need not lie along the long axis of the molecule. It could even be 
perpendicular to the transition dipole. In general it will have some fixed angle with 
respect to the transition dipole. In the dipole-dipole approximation one has 


Adin Adn 
4r eecoR? 


m,n 


[cos(ġ1) — 3 cos(¢2) cos(3)], (129) 


m,n = 


As in the case for Jm,n, four quantities can be varied by adjusting the position or 
orientation of the two dyes. These are Rn», 61, Q2 and ġ3. Because Rm,n is present 
in both Eqs. (128) and (129) and the transition dipole and the difference static dipole 
make a fixed angle with respect to each other, the degree to which Jm, n and Ky,» 
can be adjusted independently is constrained. When the transition dipoles and dif- 
ference static dipoles are not parallel, however, Jm, n and Km,n can still be adjusted 
independently of each other over a range of values. 

General expressions for Jm,n and Km,n are given in [34]; however, their evalua- 
tion requires that one obtain the transition and ground-state and excited state-static 
charge densities for the molecules by an ab initio calculation, such as through density 
functional theory and time-dependent density functional theory. 
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Abstract The origins of DNA nanotechnology can be traced back to 1982, when 
Dr. Ned Seeman proposed assembling branched junctions as 3D lattices to facil- 
itate protein crystallization. Over the past four decades, this concept has evolved 
into a multidisciplinary research field with vast potential for applications. In this 
mini review, we present a brief introduction of selected topics in nucleic acid 
nanotechnology, focusing on scaling up DNA assembly, achieving higher resolu- 
tions, and transferring to RNA structural design. We discusses the advantages and 
challenges of each topic, aiming to shed light on the enormous potential of nucleic 
acid nanotechnology. 


1 Introduction 


In natural systems, biopolymers cooperatively assemble and interact at the nanoscale 
to form microscale structures. For example, the diploid human genome contains 
approximately 3 billion DNA base pairs. Based on the canonical B-form DNA duplex 
(0.34 nm per base pair), the total length of DNA in each cell is about 2 m. How 
does such long DNA fit into the nucleus which is about 10 um in diameter? The 
answer is DNA packaging. The assembly of chromosomal DNA is a highly regulated 
and hierarchical condensation involving many proteins. In eukaryotic cells, DNA 
packaging is an important process of wrapping DNA around histone proteins resulting 
in a hierarchical well-defined structure of compact DNA-protein complexes. This 
assembly procedure occurs at a broad of length scales from nanometer to micron 
scales, displaying organizational precision down to the angstrom level (Fig. 1). 
Towards the goal of engineering bioinspired systems that rival natural systems, 
information-coding biopolymers such as nucleic acids [1-3], proteins [4—6], and 
lipids [7] have been used as building blocks in the assembly of designer nanoar- 
chitectures and nanodevices. Among them, DNA self-assembly has been broadly 
exploited [8—13], and diverse design techniques [3, 14—17] and computational 
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Fig. 1 DNA assembly happens at multiple scales in nature. Current DNA nanotechnology has 
several challenges and opportunities to be expanded in ‘depth and breadth.’ Three selected topics 
will be discussed here: (1) Engineering cell-sized DNA structures; (2) Building self-assembled 
DNA crystals with atomic resolutions; (3) Transferring to RNA structural design 


tools [18—24] have been used, resulting in increased knowledge of building things 
with DNA as well as a wide variety of nanostructures. The success of DNA self- 
assembly in constructing various architectures is attributed to several reasons. Firstly, 
Watson-Crick base pairing between complementary DNA strands is simple and 
highly predictable and thus makes the four-letter polymeric strands convenient units 
with designed pairing rules between each other. Secondly, the geometric features 
of DNA double helices are well understood, with a diameter of about 2 nm and 
3.4 nm per helical repeat for canonical B-form DNA [25]. Additionally, the devel- 
opment of several user-friendly software interfaces has facilitated the designing 
and viewing of even the most intricate DNA nanostructures before experimental 
testing [18-24]. Thirdly, modern organic chemistry and molecular biology provide 
a diverse toolbox to readily synthesize, modify, and replicate DNA molecules at 
a relatively low cost. Finally, the biocompatibility of DNA makes it suitable for 
constructing multicomponent nanostructures made from hetero-biomaterials with 
designed functions. 

Inspired by nature, scientists keep turning to DNA nanotechnology to create struc- 
tural designs that can also operate at various length scales. Here, I will share three 
examples that show how nanotechnology is moving closer to the goal of achieving 
precise structural control at the angstrom level within cells, as well as discuss 
some of the challenges that are involved. The first example focuses on the chal- 
lenge of engineering programmable structures at the scale of entire cells. This is a 
significant obstacle, as it requires creating something microns in size from many 
individual nanoscale components through programmable interactions. The second 
example highlights the challenge of engineering the designer DNA crystals, which 
is the original vision for DNA nanotechnology. It has taken decades to achieve, but 
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the process of its development has greatly expanded our understanding of molecular 
construction. Lastly, toward producing precision nanostructures inside of living cells, 
RNA is highly advantageous because it can be folded cotranscriptionally. Fortu- 
nately, the great progress in developing DNA nanostructures has accelerated the 
development of RNA nanostructures. Nucleic acid nanotechnology, originally drive 
by curiosity, has greatly informed our understanding of structural biology and now 
has produced many exciting future applications. 


2 Engineering Cell-Sized DNA Structures 


It is a challenging yet rewarding goal in DNA nanotechnology to make cell-sized 
structures because such materials have several potential applications that cannot be 
realized by smaller structures. For example, DNA origami [3] is a technique to fold a 
long single-stranded ‘scaffold’ DNA to a target object using hundreds of short ‘staple’ 
DNA strands. This technique was first introduced in 2006 and has been employed 
in many works. The DNA origami nanostructures generally show 10-100 nm in 
diameter and are great scaffolds to host a few enzymes or quantum dots due to their 
compatible sizes. Similarly, micron-sized constructs with defined shapes and finite 
sizes will provide a fully addressable canvas to organize larger guests with nanoscale 
precision. Those constructs will be able to organize larger targets, such as cells, and 
offer long-range interactions between guest molecules that cannot be achieved by 
previous small structures. The micron-sized 2D arrays can serve as high-throughput 
biological nanopore arrays for protein sequencing and biosensing. They can also be 
used as templates for creating long-range enzyme cascades/signaling, modulating 
cell-free membrane, and eventually facilitating artificial cell engineering. 


2.1 Challenges 


Although various strategies have been developed to produce synthetic DNA archi- 
tectures that exhibit significant geometric complexity, most current techniques are 
still limited to produce DNA structures smaller than 1 micron at few nanometer 
resolutions. To create larger DNA constructs with defined shapes and finite sizes, 
different strategies have been reported. Qian and coworkers advanced higher-order 
assemblies by using 2D square-shaped DNA origamis with surface patterns and 
short sticky ends hybridization to create a finite-sized 2D DNA canvas of sizes up to 
half micron [26]. 3D DNA origami higher-order structures are achieved by using an 
angle-controllable V-shaped DNA origami object to form several polyhedrons that 
are up to 450 nm in diameter [27]. The single-stranded tile (SST) DNA structures 
are also advanced to larger assemblies employing a brick design of 52-nt long with 
four 13-nt domains [28]. The 2D and 3D origami-based strategies seek week inter- 
actions between building blocks and employ a step-by-step hierarchical assembly 
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process that typically involves the formation and purification of individual origami, 
assembly of sub-units from a set of origami blocks, and the addition of origami 
blocks for final constructions. However, there are challenges associated with these 
methods. The first challenge lies in designing DNA origami that can self-assemble 
into micron-sized structures. The individual origami needs to be accurately designed 
because any twist or distortion in these units will accumulate and become amplified 
during the higher-order assembly process. Another challenge is the low formation 
yield. Potential reasons for the low yield include the possible defects in individual 
origami building blocks and the slow kinetics of origami higher-order assembly. For 
the SST method, a new challenge is the importance of sequence besides the accurate 
geometric design since there is no scaffold sequence as a guidance in the structures. In 
addition, the low formation yield, as well as the high synthesis cost, also contributes 
to the difficulty of using this method to create even larger structures. 

For the formation of 2D and 3D DNA superlattices, DNA origami nanostruc- 
tures serve as repeating units to hierarchically connect into higher-order assem- 
blies. Inspired by the first rationally designed DNA crystal structure [16], Lied] and 
coworkers created an origami version of the tensegrity triangle and demonstrated 
the assembled rhombohedral crystalline lattices [29]. Several examples showed that 
weak interaction will help the mismatched DNA origami units to dissociate from each 
other during assembly and thus promote the formation of correct target patterns [26, 
30]. One challenge is the accurate estimation of the mechanical properties of DNA 
structures. For instance, the formation of large 2D arrays generally suffers from the 
inherent flexibility of individual building blocks. It is important to develop methods 
for confining the assembly process to one plane, rather than in 3D, to grow 2D crys- 
talline assemblies. Surface-mediated growth is also a helpful strategy to encourage 
the 2D arrays development including using lipid [31, 32] and mica surfaces [33]. 
Recently, Gang and colleagues reported a novel technique of creating vertex-to- 
vertex hybridization between polyhedral DNA origami building blocks [34]. Instead 
of avoiding flexibility, they explored single-stranded loop linkers between origami, 
relied on the geometric restriction, and harnessed the flexibility of the loops. This 
vertex-to-vertex hybridization is remarkably robust and programmable being able 
to form a wide variety of polyhedral geometries. Although the DNA origami crys- 
talline lattices do not have the similar atomic resolution as designer DNA crystals, 
these higher-ordered arrays provide much larger cavities for hosting various guest 
molecules such as large enzyme complexes. 


2.2 Opportunities 


Although a variety of design techniques have been developed, it is desirable to further 
enrich the types of DNA building blocks with novel structural properties that enable 
efficient scale-up of DNA assemblies. New design strategies combined with math- 
ematical graphics will provide additional opportunities for DNA nanoconstruction 
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and enhance out understanding of DNA self-assembly as a programmable biomate- 
rial. Advanced experimental validation techniques are also valuable for revealing the 
assembly dynamics at the individual unit level. In addition, it is highly desirable to 
develop simple yet robust procedures to improve the mechanical properties of DNA 
nanostructures, while preserving their nanometer precision and dynamic features 
such as strand displacement reactions [35, 36]. 

A key opportunity is to utilized hybrid assembly methods, which involve 
combining multiple types of building block materials and leveraging different interac- 
tions between building blocks. The co-assembly of DNA polyhedral origami and gold 
nanoparticle is a great example of this approach, showing the potentials for super- 
lattice construction. In order to ensure appropriate binding forces between assembly 
building blocks for orthogonal regulation of assembly structures, a wide variety of 
binding forces should be investigated, ranging from weak interactions such as base 
stacking and sequence recognition to strong chemical bonds through click chem- 
istry and enzyme ligation. The utilization of different binding forces will help to 
create a local energetic maximization of stability, while maintaining weak inter- 
actions between assembly units. The local optimization of the recognition between 
building blocks will promote the formation of target structures, and the weak interac- 
tions will minimize any possible mismatch. More importantly, integration of orthog- 
onal interaction forces between assembly units could potentially facilitate sophisti- 
cated assembly behaviors such as dynamic or developmental self-assembly behaviors 
observed in living cells [37]. 

The possible successful formation of micron-sized programmable DNA structures 
with nanometer precision promises many new opportunities for both fundamental 
and applied research. Micron-sized DNA assemblies with fully addressable surfaces 
will enable various types of synthetic cell engineering by employing structural DNA 
assemblies as spatial frameworks. DNA nanostructures equipped with membrane- 
interacting molecules have been demonstrated as nanoscale mechanical tools to scaf- 
fold and sculpt lipid bilayer [38, 39]. Large DNA assemblies will contribute to the 
creation of artificial cells with comparable sizes as living cells and deliver a unique 
interface between cell biology and biomolecular engineering. 


3 Building Designer DNA Crystals with Atomic Resolutions 


Crystal lattices with atomic resolution are useful materials for positioning guest 
molecules at specific locations within three-dimensional space to achieve various 
functions. Obtaining self-assembled nucleic acid crystals with programmable space 
groups and cavity sizes is not only a long-standing challenge in DNA nanotechnology 
but also one of the important research frontiers. As the original idea proposed by 
Nadrian Seeman in 1982 [1], introducing target molecules into the DNA scaffolds 
will facilitate structure determination of guest molecules such as RNA, peptide, and 
protein. Rational design and synthesis of 3D DNA crystals provide both precisely 
designed symmetry and functions. Such crystals can serve as porous scaffolds to 
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arrange guest molecules at specific positions, and thus the guest molecules could be 
integral parts of the crystalline lattices. Those hybrid crystalline structures can be 
selected as novel candidates for the determination of protein structures, especially 
for membrane proteins. Self-assembling DNA crystals have also been treated as 3D 
templates for molecular electronics, such as information storage devices and zeolite- 
like nanoporous materials that are capable of catalysis [40] and molecular separations 
[41]. 


3.1 Challenges 


However, after four decades of the initial proposal [1], only a few self-assembled 
DNA crystals have been reported displacing rationally designed 3D crystalline struc- 
tures. In 2009, Seeman, Mao, and their coworkers created the first rationally designed 
DNA crystals that were based on a tensegrity triangle DNA motif, demonstrating a set 
of crystals in the rhombohedral space group R3 [16]. These are the first-ever designed 
self-assembled DNA crystals, which are published 29 years after Dr. Seeman’s initial 
inspiration [42]. In 2016, Yan, Seeman, and coworkers reported a new designer DNA 
crystal based on a layered Holliday junction design with three distinct strands, solving 
the structure by X-ray crystallography to ~3 A [43]. Later, a rationally designed and 
self-assembled 3D DNA crystal lattice with hexagonal symmetry was successfully 
created by using only two DNA strands. The six-fold symmetry, as well as the 
chirality of the crystal lattices, is directed by the Holliday junctions formed between 
the duplex motifs. Native crystals were measured and analyzed to ~3 A resolution 
with the hexagonal space group P6 [44]. 

The main scientific challenges for creating designer DNA crystals lie in both 
fundamental design and experimental growth of such crystals. Three questions will 
be discussed including how to develop robust design methods for creating a wide 
variety of designer DNA crystals with prescribed lattices, how to introduce guest 
molecules into crystals at specific locations, and how to enable DNA crystals to have 
better chemical/physical/mechanical properties better suited to broader applications. 

To create 3D DNA motifs that can be connected by rational designed sticky 
end connections rather than through nonspecific stacking, various factors should 
be examined deliberately. The designer DNA crystals should be assembled from 
rigid tiles/motifs with appropriate numbers of sticky ends. The individual DNA tile 
needs to be designed with sufficient connections to define the structural frame, and 
the interactions between tiles should be encoded into sticky ends. The sequence 
design of DNA tiles should be investigated thoughtfully. The DNA motifs are formed 
through an annealing process, where a slow temperature ramp is needed. A systematic 
crystallization protocol needs to be established to incorporate the annealing process 
with the crystallization process as well as the experimental conditions to improve 
the yield of large crystals, including buffer conditions, reservoir conditions, and 
annealing temperatures. X-ray crystallography will be the major technique used to 
determine the crystalline structures of DNA crystals. As X-ray requires relatively 
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large sizes of crystals, other techniques, such as micro-electron diffraction (Micro- 
ED), can be employed to solve the structures of nanocrystals. 

Self-assembled DNA crystals contain large cavities that make them excellent 
scaffolds for attaching various biomolecules (e.g., peptides, proteins, and RNAs) to 
achieve different functional applications. The major hurdles to adding guest mate- 
rials into designer DNA crystals are two-fold. First, the binding of the DNA and 
guest molecules shouldn’t change the space groups of the predefined 3D crystal 
lattices, and the guest molecules can be only arranged in the appropriate cavities of 
the designer crystals. Second, the interaction between the DNA lattices with the guest 
molecules should be robust and specific, so that the guest molecules can be treated as 
an internal part of the crystal lattices with atomic-level precision for further applica- 
tions. For these two main challenges, one simple solution is to use sequence-specific 
DNA binding peptides or proteins as the guest molecule. The interactions between 
DNA and peptides/proteins are highly specific and strong, which exists naturally 
and doesn’t need artificial linkers between DNA and proteins. Another advantage of 
using DNA binding peptides as the guest molecules is that these types of proteins 
generally have small sizes and can easily diffuse into the cavities of a crystal. There- 
fore, we can separate the experimental steps between the growth of the designer 
DNA crystals with the soaking of target DNA binding peptides/proteins, so that the 
crystal lattices can be preserved after introducing guest proteins. Another exciting 
prospect is constructing DNA/RNA and RNA/RNA self-assembled crystals by incor- 
porating RNA strands into the DNA crystals and adapting a DNA motif design to 
create RNA motifs. These hybrid crystalline materials will provide new applications 
in many areas, such as assisting the structural determination of small RNA struc- 
tures, building molecular devices from functional RNA motifs, and studying the 
RNA-protein interactions. 


3.2 Opportunities 


In addition to expanding the structural diversity of designer DNA crystals, one 
interesting topic is generating unconventional states of materials [45], such as 
quasicrystals and disordered hyperuniform DNA structures, in a controllable and 
programmable fashion. The mathematic description and simulation of using patch 
particles to create quasicrystal patterns were reported [46] before the experimental 
realizations [47]. The well-studied DNA multi-arm junction tiles make them ideal 
model systems to conduct programmable self-assembly in 2D. Rational designed 3D 
quasicrystals using DNA tiles remain a challenge. Computational simulations using 
3D particles were reported [48], and designer DNA quasicrystals are likely to be 
achieved in the foreseeable future. Other states of matter that have been introduced 
in recent years can be new design targets for engineering bioinspired and biomimetic 
systems in DNA nanotechnology. Such studies will enrich the types of programmable 
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functional biomaterials, establish the structure—function relationship for unconven- 
tional states of materials, and gain knowledge of fundamental self-assembly in such 
systems. 

The creation of DNA crystalline scaffolds with atomic resolution will provide 
precise spatial programmability of molecules, offering many application opportuni- 
ties for various research areas. For example, it is useful to create a 3D network of 
enzyme cascades or energy transfer pathways based on chromophores with accu- 
rate location and orientation by employing DNA crystal templates. 3D DNA crys- 
tals also serve as zeolite-like/nanoporous materials that are capable of catalysis and 
molecular separations. In addition, some applications require intimate knowledge 
of the atomic-level details of DNA structures, and the designer DNA crystals can 
provide such information. Although researchers have obtained cryo-EM reconstruc- 
tions for various DNA complexes and revealed the structures of DNA complexes 
with nucleotide resolution [49, 50], universal strategies are not established yet for 
creating different types of designer DNA crystals. For instance, there is no crystal 
structure of the double crossover (DX) tile that existed in most DNA nanostructures 
including DNA origami. Consequently, the deformation near crossovers introduced 
by adding the crossovers, and the base-level sequence impact of DNA structures are 
still unknown. Angstrom-level positioning accuracy at the scale of individual DNA 
bases will enable new opportunities such as chromophore placement on DNA-dye 
assemblies. Structural information gained from crystals can be used to improving 
DNA designs, leading to a greater understanding of DNA as a biomaterial. 

The potential applications of DNA crystals extend beyond their use as scaf- 
folds for crystallization. One example is using crystalline complexes as molecular 
‘sponges’ to detect or separate target molecules by trapping or binding them in the 
cavities. To expand the scope of DNA lattices to other fields—especially ones where 
biological materials may not be stable—integrating designer DNA crystals with other 
materials will be highly desirable. For example, it is possible to introduce inorganic 
materials growing along DNA helixes, so that the resulting products will inherit 
the programmable geometric feature of DNA crystals. Like the biomineralization 
process, inorganic materials, such as silica, can be deposited on the surface of DNA. 
Our recent study revealed that silica- DNA composited structural nanomaterials with 
a set of controllable morphologies can be created using chemical reactions [51]. This 
technique was further developed to cost silica shell on 3D origami lattices [52] and 
to create superconducting 3D materials with another layer of niobium coating on top 
of silica layer [53]. This technique has the potential to be adopted into designer DNA 
crystals as well. One challenge is how to allow the depositing materials to permeate 
the 3D crystals through their relatively small cavities. The resulting porous materials 
should exhibit significantly improved mechanical properties due to the inorganic 
coating. By transferring the structural programmability of designer DNA crystals to 
inorganic materials, the nucleic acid-based concreating strategy will open exciting 
opportunities for novel nanofabrication with various application potentials. 
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4 Transferring to RNA Structural Design 


An important landmark in the development of nanotechnology was the use of biolog- 
ical molecules as building blocks to construct devices with control of structures and 
function at the nanometer scale. RNA, which shares some general features with DNA, 
has played a unique role. Unlike DNA, RNA has an inherent architectural potential 
to form a wide variety of interactions far beyond the Watson—Crick base pairing 
[54, 55]. As opposed to DNA, naturally existing 3D RNA molecules and man-made 
RNA building blocks/tiles [56, 57] at an atomic resolution can be modified and 
provide potentially a much larger toolkit to readily build a variety of structures with 
high complexity. In addition, functionalities associated with RNA molecules, such as 
catalysis [58], gene regulation [59], and organization of proteins in large machineries 
[60], enable their use in material and biomedical sciences [61]. Most importantly, 
RNA molecules can be readily synthesized in cells through transcription [62]. There- 
fore, RNA nanostructures have a great potential to build self-assembling nanodevices 
inside cells by utilizing the cellular nucleic acid synthesis pathway. 


4.1 Challenges 


Although RNA shares many general geometric and chemical features with DNA, 
the construction of custom RNA nanostructures has been hindered. For instance, it 
is still challenging to rationally designed RNA objects with comparable size and 
complexity to natural RNA machineries or current highly sophisticated DNA nanos- 
tructures. The emerging field of RNA nanotechnology has attracted increasing atten- 
tion from diverse research areas in recent years, and many RNA nanostructures have 
been constructed, including squares [63, 64], tubes [65], arrays [66, 67], and 3D 
objects [68, 69]. In most studies, the use of conserved, naturally evolved motifs with 
predictable tertiary structures dominates the current RNA self-assembly methods. 
Natural RNA building blocks, called structural modules, can be combined and rear- 
ranged in a large number of ways into target shapes. But currently by using this 
method, the sizes of constructed RNA nanostructures are generally smaller than 200 
nucleotides, and the complexity of RNA designs has been limited as well [70]. As a 
complementary approach, a de novo design strategy offers higher versatility, which 
allows us to build structures/functions that do not exist in nature but fulfill our needs. 
De novo designed RNA nanostructures have emerged recently. One example is a 
designed RNA nano-prism that self-assembled from eight T-shaped RNA motifs 
[71]. Those motifs have well-defined configurations and were created following the 
example of DNA T-junction tiles [72]. As compared with previous works that used 
naturally existing 3-way RNA junctions to construct polyhedrons [71], this de novo 
design method provides a unique way to control synthetic RNA structures, such as 
the angle, the length, and the sequence of target structures. A recent example of the 
T-shaped RNA tile is a single-stranded version named branched kissing loops [73]. 
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16 different linear and circular assemblies have been constructed from the tile by 
adjusting the tile geometry. 

Self-folding of a single RNA chain into a defined complex nanostructure repre- 
sents a new era in nucleic acid nanotechnology. Two-dimensional and unknotted 
folding of single-stranded RNA with 6300 bases long were demonstrated using 
thermal anneals [74]. Single-stranded RNA knots with 1000 nucleotides were 
reported by hierarchical folding in prescribed orders, exhibiting an unprecedented 
amount of complicated topological features [75]. One most exciting feature of RNA 
molecules is their cotranscriptional folding ability, which offers attractive poten- 
tial applications for synthetic biology. The single chain of RNA folds itself during 
the transcription process in isothermal conditions in vitro and in cells. The ratio- 
nally designed RNA assemblies with cotranscriptional folding provide a new avenue 
that could create self-assembled RNA scaffolds interfacing with synthetic biology 
and nanomedicine. Geary, Rothemund, and Andersen pioneered the RNA origami 
approach [67], enabling cotranscriptional assembly by arranging RNA helices paral- 
lelly through crossovers and kissing loops. Recently, the same team introduced RNA 
origami automated design software and extended the creation of large RNA tiles up 
to 2360 nucleotides [76], which represents the largest synthetic RNA structure that 
can be folded cotranscriptionally to date. 


4.2 Opportunities 


Since RNA structures began be characterized in higher resolution, the more we 
learn about RNA molecules, the more we realize how much we don’t know. There 
are a tremendous amount of knowledge and undiscovered rules for RNA structural 
and functional design. DNA nanostructures generally need annealing, a temper- 
ature cooling process, to promote the formation of designed Watson—Crick base 
pairs, leading to the minimal free energy (MFE) configurations. While natural 
RNA molecules don’t always fold into the MFE conformation [77, 78], the single- 
stranded RNA origami strategy [76] employs localized domain modules, optimizes 
MFE domains, and uses a multi-stage sequence optimization procedure to facili- 
tate the isothermal folding. Considering the complexity of natural RNA structures 
and RNA folding, it is promising yet challenging to discover new design strate- 
gies. Inspirations could come from the unique features of RNA. For instance, RNA 
structural design allows us to harness the kinetic energy in RNA assembly. Near 
MFE configurations could be possibly achieved and stabilized through inserting 
local kinetic traps, adding protein binding regions, and topological constraints. 
Incorporating computational components into the RNA assembly process may also 
enable the creation of programmable dynamics. Particularly, in single-stranded RNA 
folding, we can imagine that inserting intramolecular strand displacement reactions 
in RNA sequence design could contribute to sequential and spatiotemporal control 
of dynamic assembly, such as on-the-fly single-chain folding. Thanks to the develop- 
ment of data science and machine learning, data-driven and data-based strategies have 
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started to show the power in RNA folding prediction as well as sequence generation 
for target structures. These computational tools offer an efficient means of exploring 
a sufficient number of RNA sequence-structure pairs to provide insights into what 
may work, thereby reducing the need for extensive wet-lab experimentation. 


5 At the End 


The Olympic motto, proposed in 1894, is the hendiatris Citius, Altius, Fortius, which 
means faster, higher, and stronger. The design, building, and assembly of DNA at 
nanoscales share a similar ethos with the Olympic games, striving for advance- 
ments in areas such as size, accuracy, versatility, adaptability and cost-effectiveness. 
Through DNA research, a single question can lead to many answers, and DNA 
assembly opens up endless possibilities for exploration. Nucleic acid nanostructures 
play the roles from nanomaterials to molecular devices and tools. As it continues to 
progress, it is only a matter of time before we witness the widely adopted real-world 
applications of nucleic acid nanotechnology. The question now is which one will be 
the first. 

Examples of exploiting DNA nanostructures as drug delivery vehicles have been 
demonstrated for targeted delivery with precise control of the structures, compo- 
nents, and reconfigurations [79-83]. Moreover, nucleic acid nanostructures have been 
investigated for other biomedical applications beyond drug delivery. For example, 
the cotranscriptionally folded single-stranded RNA origami showed significant anti- 
coagulant activity which was sevenfold greater than free aptamer [84]. Researchers 
also developed a set of aptamer-decorated RNA origami with the ability to reverse 
the anticoagulation activity [85]. Single-stranded RNA origami has been used as an 
immunostimulatory reagent to stimulate innate response for cancer immunotherapy 
[86]. In a recent case, a half polyhedron shell formed from higher-ordered DNA 
origami assemblies works as a virus trap, which was decorated 90 sites of virus- 
binding moieties in its interior surface [87]. More innovative usages of nucleic acid 
nanostructures should be explored to fully benefit from their structural, dynamic, and 
functional features. 

Individual 2D DNA origami has been placed onto lithographical patterns precisely 
for nanophotonic applications [88]. Arrays of DNA origami patterns with sizes larger 
than few micrometers can be integrated with top-down lithographic patterning tech- 
niques to access even larger length scales. One can imagine that the future 3D inte- 
gration between DNA assemblies, such as 3D DNA superlattices, and top-down 
lithography could be implemented to address the needs of extending to the third 
dimension in nanofabrication. 

The key to providing a vast library of described functionalities lies in creating 
hybrid systems that incorporate nucleic acid, proteins, lipid, quantum dots, and 
other functional materials. Adaptability issues are crucial for the successful creation 
of hybrid materials. With the consideration of efficacy, scalability, robustness, and 
safety, nucleic acid-directed assemblies and devices provide a multitude of possible 
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applications that may produce novel materials with unique functionalities beyond 
our wildest imagination today. 
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Abstract To celebrate the 40th anniversary of bottom-up DNA nanotechnology 
we highlight the interaction of the field with mathematics. DNA self-assembly as a 
method to construct nanostructures gave impetus to an emerging branch of math- 
ematics, called here ‘DNA mathematics’. DNA mathematics models and analyzes 
structures obtained as bottom-up assembly, as well as the process of self-assembly. 
Here we survey some of the new tools from DNA mathematics that can help advance 
the science of DNA self-assembly. The theory needed to develop these tools is now 
driving the field of mathematics in new and exciting directions. We describe some 
of these rich questions, focusing particularly on those related to knot theory, graph 
theory, and algebra. 


1 Introduction 


Seeman’s ground-breaking work in DNA self-assembly that initiated the field of 
DNA nanotechnology has had surprisingly broad impacts on many other fields. In 
particular it has impacted the field of nanotechnology more widely [1—4], the science 
of computing and molecular programming (e.g., [5-10] as well as articles in the pro- 
ceedings of DNAx for over 26 years), bottom-up nano-assembly [11—19] including 
computer science and mathematics [20-28]. Advances in the sciences often drive the 
creation of completely new mathematical fields. The field of DNA nanotechnology 
is a prime example of this phenomenon. It has been steadily spawning a slew of 
new mathematical problems. These problems are now giving birth to a new field of 
mathematics that might be called ‘DNA-mathematics’. 

We live in a world dominated by computers, mobile devices, advances in tech- 
nology, and the internet. We have foodwebs, social networks, and contact tracing in 
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epidemiological models. The mathematics of interconnections, in particular network 
and graph theory, became essential for better understanding of many modern life sit- 
uations. For example, graph drawing tools led to effective computer chip layouts 
while random graphs are often used in modeling the worldwide web. We now see 
this same phenomenon of new mathematics emerging from questions about shapes 
and interconnections driven by nucleic acids structures in biology and DNA nan- 
otechnology. 

In his pioneering article on DNA self-assembly Ned Seeman proposed using DNA 
molecules, called branched junction molecules, to build complex nanostructures [29]. 
These molecules are shaped like starfish with anywhere from three to twelve arms 
and can attach to one another via sequences of complementary DNA bases at the 
ends of their arms (sticky ends), as though the starfish are holding hands. Thus, for 
example, four three-armed branched junction molecules can join together to form 
the outline of a tetrahedron. 

In 2006, a major advance in DNA self-assembly was obtained by Rothemund’s 
introduction of DNA origami [30], a method where a single-stranded DNA plasmid 
outlines a pre-designed shape and about 200-250 short strands of DNA complemen- 
tary to different locations of the plasmid are used to assemble and secure the structure. 
Because of the potential applications of self-assembling DNA nanotechnology, espe- 
cially in medicine, and also in nanoscale robotics, circuitry, and biosensors, hundreds 
of laboratories around the world today focus on it. 

Many challenges arise in designing DNA molecules that are to self-assemble into 
a desired shape. While some of the challenges involve chemical processes, many 
others involve structural questions such as which arms of the branched junction 
molecules should attach to which other arms, or how to route the scaffolding strand 
and staples through the desired structure, so that the smaller molecules then self- 
assemble into exactly the desired larger shape. As the experimental advances evolve 
rapidly, mathematical foundations to address new and upcoming questions that arise 
from laboratory experiments are becoming more essential. 

DNA self-assembly now involves novel mathematical approaches and tools that 
inform the design of the structure both to assemble the nanocomplex as well as to 
ease the analysis of the experimental results. It becomes particularly exciting, from a 
mathematical perspective, when these approaches diverge from the original stimulus 
to problems of intrinsic mathematical interest independent of the initial application, 
thus significantly expanding the scope of the mathematical investigations. 

Fortunately, problems involving shapes and interconnections are at the heart of 
mathematics and mathematicians have become natural collaborators to DNA self- 
assembly researchers. New mathematical formalism is being developed to solve 
mathematical problems arising from DNA self-assembly. Many of the target molec- 
ular shapes have wireframe structures, such as the outlines of a cube [31] or octa- 
hedron [32], or 2D and 3D lattices [11, 14], or even triangular mesh bunny rabbits 
and lacy snowflakes [33]. Since the outlines of these wireframe structures corre- 
spond to the edges of a graph and the corners of the shape to the vertices of a graph, 
graph theory (in particular topological graph theory) and knot theory have emerged 
as an excellent platform to study assembly problems. However, existing graph and 
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knot theory lack sufficient descriptive properties to capture the essence of DNA self- 
assembly and cannot always address the new problems arising from the self-assembly 
application. Thus, a new subfield in mathematics is born, DNA mathematics, which 
develops new mathematical tools for, and arises from, bottom-up assembly. 

As shared in the sections below, the mathematical problems driven by self- 
assembly processes are rich and prolific. They lead discrete mathematics and topol- 
ogy beyond current mainstream trends in these fields and hence break open new 
directions such as edge-outer embeddability, origami knotting, and new algebraic 
languages to describe structures. These theoretical directions are open-ended, gener- 
ally scale independent, and will lay a broad foundation for future growth applicable 
to self-assembly in many settings, from nano to macro. 


2 Flexible Tiles and New Graph Invariants 


Fig. 1 Three-armed 
branched DNA molecule 


2 
seen as three-valent vertex in 2 
a graph with three $ A L 
R A NS 
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half-edges. The 
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The appearances of the first three-dimensional DNA structures such as the 
cube [31], and the truncated octahedron [32], arrived at the same time as the idea of 
using nucleic acids and biomolecules for computations when molecular-based infor- 
mation processing was initiated with Adleman’s seminal paper [34]. If computation 
is to be performed with molecular structures, assemblies of arbitrary 3D wireframe 
or graph-like molecules without inherent symmetry may be necessary. The first such 
construct of a graph with six vertices using branched junction molecules representing 
vertices, and regular duplex molecules representing edges, was reported in [35, 36]. 
These molecules contained non-paired nucleotides throughout the duplexes making 
the arms of the molecules flexible such that, in the process of assembly, their sticky 
ends could easily join their respective complements. After ligating the nicks (breaks 
in the strands), the resulting (graph) structure consisted of a single cyclic DNA strand, 
which could conform in at least two knotted topologies [36]. These experimental 
results initiated a mathematical model that captures some of the design challenges of 
the flexible armed tiles used in construction of spatially embedded graph structures. 

A combinatorial abstraction that consists of a vertex with half-edges labeled by 
the sticky-end types on the arms of the branched junction molecule is called a tile 
(Fig. 1). It can be denoted as a multi-set of bond types indicating the types of sticky- 
end types flanking the ends of the arms. The complementary sticky-end types are 
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Fig. 2 a Four tiles with their corresponding bonding types in colored arrowheads appearing at the 
end of the half-edges. The complementary bond types are indicated with reversed arrowheads. b A 
tetrahedron realized by a pot containing the tiles in (a) 


denoted by marking bond types with two complementary versions, indicated with a 
symbol from an alphabet (e.g., a, b, c, .. .) and a hatted version of the symbol (e.g., 
a,b,é,.. .). Multiple entries of the same bond type are indicated by the exponent to 
the corresponding symbol. For example, Fig. 2a shows tiles ti = {å, b, ĉ}, t2 = {a>}, 
t = {â, b, ĉ}, and t4 = fâ, c*}. 

A collection of tile types forms a pot, and a target graph G (or other 3D wireframe 
structure) is realized by a pot if it can be obtained by matching the vertices of G with 
tile types and identifying two half-edges of tiles having complementary versions of 
the same bond type with an edge of the graph. In the most basic setting G is an abstract 
graph, but other models also consider the geometry of the target graph. Each pair of 
half-edges forming an edge is subject to the restriction that a symbol is always paired 
with its hatted version and vice versa. Abstractly, a graph is considered assembled 
from tiles, if every vertex of the graph corresponds to a vertex of a tile and each edge 
corresponds to a pair of half edges with complementary sticky-ends joined together 
to form a bond edge. This is equivalent to finding an edge-labeled orientation of the 
graph, with the arrows pointing from the unhatted to the hatted half edges making 
up the labeled edge. In theory, any covering of a graph with a cycle realized by a pot 
can also be realized by the pot, but entropy disfavors these larger constructs. Further 
description of the model can be found in [28, 37-39]. 

There are two aspects to consider in such a‘pot’ and ‘tile’ set-up. One aspect 
considers properties of the pot: what types of graphs can be realized by a pot, how 
many isomorphism classes are there, are all structures that are realized complete (1.e., 
there are no unbonded half-edges), or, can they be always completed, etc. The other 
aspect asks questions about the graphs and the structures: what is the most ‘efficient’ 
pot (in terms of minimal number of tile types needed and bond types needed) that 
can realize a given graph, how are the properties of the pot related to other invariants 
of the graph in question, etc. 

The properties of pots are most suitably studied with methods from linear algebra 
by associating a matrix to each pot whose ij-entries are ratios of bond type i for a 
tile type j present in a pot. Using this matrix, one can determine the stoichiometry 
of the self-assembly to ensure certain types of pot properties [39] (see also [37]). 
Although useful, this linear algebra approach does not capture the difficult task of 
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better understanding of the process of assembly. As larger complexes form in a 
test tube, the thermodynamic properties can change and further assembly of larger 
structures may depend of the entropic conditions in the test tube, hence an expansion 
of the model to include thermodynamic properties may be necessary. One can also 
associate computational complexity classes with types of pots, where the number 
of tiles used in a structure that computes a problem is a base for the associated 
classes [23]. It was shown that 3-colorability of a graph, or other NP-hard! problems 
could be solved in O(1) bio-steps [23, 36] by physically assembling the structure 
that emulates the solution of the problem. This was also shown experimentally by 
Seeman’s lab [7]. However, it seems we are yet to develop a good model that can 
encapsulate information processing by 3D structures, or a model that can tell us about 
computations by shapes. 

Determining an ‘efficient’ pot for a target graph G means specifying a pot of 
tiles that realizes combinatorially, with a minimum number of tile types, and using 
a minimum number of bond-edge types. Recall that in the assembly process there 
are typically ~ 10! tiles of each type present in a tube, each tile can be used within 
a structure multiple times (or often taken, an arbitrarily large number of times). For 
the construction of G one can require an edge labeled orientation of the graph using 
a minimal alphabet for the labels. This becomes a new graph invariant B(G), the 
minimal bond alphabet. In order to prevent the self-aggregation of tiles during initial 
synthesis, a natural experimental requirement can be added so that no two half- 
edges of the same tile have complementary labels. This corresponds to prohibiting 
loops in the target graph. One can further consider the minimal set of vertex types 
(the resulting molecular components) needed for construction of the graph G, an 
invariant T (G). The invariants T (G) and B(G) are new graph invariants of intrinsic 
interest that have yet to be determined for most graphs. Except for a few special 
graph classes in some specific pot setting, these combinatorial questions are wide 
open. Familiar graph theoretical tools such as coloring, chromatic numbers, classical 
graph automorphism, etc., do not appear to determine T(G) or B(G) in any of the 
settings. The chromatic number provides a lower bound in some settings, although 
a poor one as it and T (G) can be arbitrarily far apart [28]. 

The problem can further be confounded by the constraints of several different 
experimental settings, including the degree of flexibility or rigidity of the arms and 
the strength of the cohesive sites, as well as yield considerations such as whether or 
not the incidental creation of complete complexes smaller than the target graph is 
acceptable. In [40] it was confirmed that it is NP-hard to determine the output of a pot 
and to determine whether a pot that assembles a target structure will also assemble 
unwanted smaller structures. 

Mathematical formalism, models, and design tools for types of self-assembly 
can be found here [28, 37, 38, 40, 41], with [42] further specifying the inter-arm 
angles and cohesive end rotation orientations for rigid tiles. These resources provide 
provably optimal design strategies, i.e., combinatorial specification for the minimum 


' Loosely speaking, NP-hard problems are those shown to be at least as difficult as a large set of 
other problems for which no fast algorithms are known. 
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Fig. 3 Changing strand connections at a three-valent vertex in a graph. As DNA strands have 
polarity with a phosphate end (5’) and a hydroxyl group on the other end (3), the arrowheads 
indicate the 3’ ends. The strands bind with opposite polarities. A three-armed junction molecule 
can exhibit two types of connections as depicted, e.g., the green strand has a direction left to right 
in the left figure, while it has direction left to top in the right figure 


number of cohesive regions and the minimum number of branched junction molecule 
types, to assembly a given wireframe target structure, for several common graph 
classes. Further computations of these invariants for specific graphs such as Platonic 
and Archimedes solids and prisms can be found in [43]. A further extensive body of 
work can be found on DNA Wang tiles, which are the restrictive case of completely 
rigid four-armed tiles. See, among others, Refs. [5, 44-49]. 


3 DNA Strand Routing and Topological Graph Theory 


Determining routes for a scaffolding strand throughout assembly targets is integral to 
both DNA origami [27, 50, 51] and experimental verification of graph constructs [7]. 
A vertex of a graph structure can be traced by DNA in multiple ways such that the 
resulting DNA structure is assembled by different sets of cross-hybridizing cyclic 
molecules. 

For a three-valent vertex, the DNA structure representing such vertex can have two 
local strand configurations (vertex connections) as shown in Fig. 3. One configuration 
can be represented by ‘non-crossing’ strands at a vertex, and the other is obtained by 
“crossing strands’. By changing vertex connections, the number of cyclic molecules 
assembling the structure can vary. Figure 4a and b show two strand outlines of the 
same triangular prism graph. In Fig. 4a, the strand connections at each vertex in this 
planar representation do not ‘cross’ each other and the graph is outlined by five 
cyclic strands. By changing the strand connection at vertices vı and v4 with the other 
configuration, the graph can be routed by a single circuit, hence outlined by a single 
cyclic molecule (as shown in Fig. 4b). 

Early work focused on number of cyclic molecules in such double-strand covers, 
where the target graph is covered by a set of circuits so that every edge is covered 
exactly twice, as in Fig.4a [52]. These circuits form the facial walks of an oriented 
embedding of the graph, and thus correspond to a strong version of the cycle double 
cover conjecture, as described below. Double-strand covers have re-emerged recently 
in novel origami methods such as [53]. 
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Fig. 4 Possible routes for DNA strands in a triangular prism graph: a following the faces of a plane 
embedding produces 5 faces, corresponding to a maximum strand number of 5. b by changing 
strand connections at two vertices, the graph is traced by a single strand, reflecting that it is upper- 
embeddable on a double torus (see Fig. 5a). c A minimal length reporter strand for the graph, giving 
an edge-outer embedding on the torus (see Fig. 5b) 


In DNA settings, strand routings of structures require turns at vertices that are 
constrained, i.e., routes may not doubleback on an edge. Because DNA strands are 
oriented, any repeated edges in a strand route must be traversed in the opposite direc- 
tion when revisited. Thus, by folding a DNA origami in a graph structure, the strand 
routing objective is to find a route in the graph which meets these constraints, traverses 
every edge, ideally, with a minimum number of repeated edges. A similar problem 
arises in determining a route for the reporter strand, where after an experiment has 
been conducted, a single strand traversing the structure is extracted from the assem- 
bled construct and analyzed to yield the experimental data (see [7]). Here again, the 
objective is a route covering the graph with a minimum number of repeated edges. 
Such a minimal strand routing for the triangular prism graph is shown in Fig. 4c. In 
this case, the connections at three vertices, vo, v3, and v4 are changed relative to the 
connections in Fig. 4a. 

The strand routing problem is in general intractable, even in the special case of 
an Eulerian surface mesh when an optimal route follows face boundaries. In this 
case the problem corresponds to finding A-trails in the surface, which are Eulerian 
circuits that turn either left or right at each vertex. This problem is known to be NP- 
hard (see [54]) even on the plane. In [51] it is proven that the strand routing problem 
remains NP-hard even for graphs of maximum degree 8. A survey of approaches 
to the strand routing problem is given in [27]. The problem can be translated to the 
traveling salesman problem (TSP), which makes the wealth of software available for 
solving the TSP available to DNA origami applications. 

A notable alternative approach is given in [33]. Provided that the graph is an 
augmented triangulation of a surface topologically equivalent to a sphere, there is a 
fast algorithm for routing the scaffolding strand, albeit at the cost of duplicating a 
number of edges in the final nanostructure. 

Mathematically, different routings correspond to different cellular embeddings 
of a graph in a surface (drawings of a graph in a surface such that its edges do 
not cross each other and such that the complement of the graph in the surface is 
a set of topological disks). In general, the number of strands outlining a graph is 
counter-proportional to the genus” of the surface in which the graph is embedded. 


2 Intuitively, the genus of an orientable surface is the number of holes, or handles it has, so that a 
torus has genus one, a double torus has genus two, and so forth. 
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(b) 


Fig. 5 Embeddings of the triangular prism graph in Fig. 4. a Embedding of the graph in a double 
torus. The graph can be realized as a tertiary structure of a single DNA strand, shown here tracing 
the single face. b An embedding of a graph in a torus with a minimum length reporter strand tracing 
face boundary that covers all edges at least once 


As has repeatedly been the case with self-assembly applications, a wealth of new 
mathematical directions have emerged from the design objective, with edge-outer 
embeddability as described below being a particularly rich example expanding on 
topological graph theory. 

An edge-outer embedding is a cellular embedding of a graph in an orientable 
surface where every edge lies on a special ‘outer’ face (there may be other faces too). 
Figure 5 shows an edge-outer embedding of the triangular prism graph in the double 
torus and the torus corresponding to the reporter strand routings in Fig. 4b, c. In [55] 
and [56] it was shown that the facial walk of an edge-outer face in an edge-outer 
embedding of the target graph exactly captures reporter strand routing constraints. 
While outer-planar and outer-projective-planar graphs, i.e., graphs in the plane or pro- 
jective plane with all vertices on a single distinguished face, have been heavily studied 
[57], edge-outer embeddable graphs are an entirely new, yet very natural, construct. 

Conventional graph theoretical tools do not apply directly to edge-outer embed- 
dability. For example, the Chinese postman problem lacks the bidirectional constraint 
while upper embeddability results require every edge to be covered exactly twice [58]. 
That every graph has such a reporter strand route, and hence an edge-outer embed- 
ding was shown in [55], with a short algorithmic proof given in [56]. At the heart of 
the proof is the reconfiguration shown in Fig.3, which changes the cyclic order of 
the edges about a vertex. Finding a minimum such route, that is an embedding with 
a smallest possible edge-outer face, is in general NP-hard [56]. 

Such a computational complexity observation introduces a wealth of further open 
questions. If the graph is Eulerian, an optimal reporter strand route is simply an Euler 
circuit. Thus, in this case, Fleury’s algorithm provides a polynomial time solution to 
the problem. For what classes, other than Eulerian graphs, are there polynomial-time 
algorithms to find optimal reporter strand routes? Does there exist a polynomial-time 
algorithm guaranteed to return a route that is within x% of the optimal length for 
some reasonable x? 

It is also natural, both mathematically and for the application, to consider maxi- 
mum length routes. If the graph is upper embeddable, i.e., has an orientable one-face 
embedding, then this embedding is a maximum edge-outer embedding with every 
edge covered twice in the facial walk (see Fig. 4b, c); however, not every graph is 
upper embeddable. As with the dichotomy in the complexity of maximum and mini- 
mum genus of the graph, it is possible that the maximum length problem is tractable. 
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4 DNA Origami and New Algebraic Structures 


Fig. 6 A segment of a DNA 
origami structure with black 
scaffold and colored staples. 
The generators of the 
proposed monoid are boxed 


Since their introduction to the scientific community, DNA origami structures [59] 
have become one of the most prevalent experimental substrates for a variety of 
3D structures [12]. The method uses a single-stranded DNA plasmid vector as a 
scaffold that outlines the shape of the desired structure with help of so-called staple 
strands. The staple strands are short segments of single-stranded DNA (about 24 to 32 
nucleotides long) that span across two or three segments of the scaffold strand to bind 
it in place. In order to achieve stability of the construct, it is necessary that the staple 
strands cross each other within the structure in an antiparallel way. An example 
of a portion of an origami structure is depicted in Fig.6. (Figure adjusted from 
[30] with added boxes). Systematic mathematical methods to describe DNA origami 
structures have not been established. An algebraic language that describes DNA 
origami motivated by the Temperley-Lieb algebra that has been extensively used in 
physics and knot theory, particularly with the Jones polynomial and the Kauffman 
bracket, was introduced in [60, 61]. Such languages could provide a method for 
modifying (through strand displacements) a given design to achieve either a more 
effective and stable structure or a completely new geometric shape. 

A well-studied Jones monoid J, has n generators are u;,..., Un. Each generator 
can be represented with n + 1 lines such that u; has cap/cup connection of lines i and 
i + 1 as shown in Fig. 7a while all other lines are vertical. The product of generators is 
diagramatically represented by concatenating the corresponding diagrams vertically. 
Figure 7b shows the product u;uj;+;u; to the left of the equality where the vertical 
lines have not been depicted. 

The set of relations of J, is depicted in Fig. 7b—d. For example, (b) represents the 
relation u;u;,,u; = u;. Although the generators and relations are symbolically writ- 
ten, they correspond directly to the depicted diagrams and planar isotopy. Taking this 
diagrammatic usage as an advantage, we use a monoid version that is a generalization 
of J, for describing DNA origami. 
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Fig.7 Generator u; in (a) and three types of relations in (b, c, d) of the Jones monoid. b Conjugacy 
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Fig.8 a Two types of generators, cross-over staples connecting straight-line portions of the scaffold 
(denoted with œ) and straight staples connecting two cross-over turning tips of the scaffold (denoted 
with £). b Multiplying generators is represented by structures connecting scaffold segments and 
respective staples, when they don’t ‘cross-over’ the scaffold strand. c An idempotent rule; the 
product aq has the top and bottom structure the same as a. The cyclic portion in the middle does 
not affect further products 


The basis of the origami algebraic language is a monoid O, inspired by Jn. It is 
defined by generators and relations whose diagrammatic representations and closures 
are parts of DNA origami with n + 1 scaffold strands tracing the structure (e.g., in 
Fig.6 the structure is traced with 6 passings of the scaffold across top to bottom, 
and one can consider the structure as a representative of O;). The monoid has two 
types of generators, those that correspond to the scaffold strands connecting straight 
vertically in the structure (w’s) and those that correspond to staple strands connecting 
across (f’s). The generators œ; and £; of the origami monoid O, of n strands are 
depicted in Fig. 8a, where the cap and cup are placed at the i-th position from the 
left, and the vertical lines surrounding these positions are not depicted. 

We define a DNA origami monoid O,, onn + 1 strands by generators and relations 
as follows. Generators of ©, are a;, fori = 1, ..., n. The œ;’s represent local DNA 
foldings of a pair of staples of the form of cap and cup as indicated on the left in 
Fig. 8a. An additional set of generators 6; corresponds to pairs of caps and cups of 
the scaffold strand as indicated in the figure to the right. The index i indicates that 
the caps and cups are between the ith and i + Ist strands of the structure. The set of 
relations is defined analogously to those of J„, while the closure of the diagram is 
defined in a manner similar to the so-called plat closure in knot theory [62]. 

To justify modeling DNA origami structures by words over the generators we make 
a correspondence between concatenations of generators œ;, 8; and connections of 
DNA segments. For a natural number n > 2, the set of generators of the monoid O, 
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is the set XU, = {œ1, @2,..., Qn, Bi, Po, ..., Bn}. For a product of two generators x; 
and y; in &,, we place the diagram of the first generator above the second, lining 
up the scaffold strings of the two generators, and then we connect the respective 
scaffold strands. If the two generators are ‘far’ apart, that is, if for indices i and j 
we have |i — j| > 2, then no staple connection is performed. If the two generators 
are adjacent, that is, if for indices i and j it holds that |i — j| < 1, then we connect 
part of the staples. Since the staples are not too long within an origami structure, 
spanning only two or three segments of the scaffold, a convention of connecting 
staples representing a product of generators is motivated by the manner in which 
staples connect within the DNA origami structure. The staples of -type generators 
protrude “outside” of the scaffold in Fig. 8a. The staples are connected everywhere 
except when two non-extending staple-ends would have to cross a scaffold to connect. 
With this convention it means that œ;ĝ; and f;a; are two distinct words, i.e., that 
as and fs do not freely commute. The rules of connecting scaffold strands and 
staples assures that concatenation of three or more generators is associative. The 
graphical representation of a product œ;ß; is shown in Fig.8b where the vertical 
strands surrounding the generators are not depicted. 

The generators of the origami monoid satisfy a set of relations, according to 
the types of graphical structures they represent. One of the relations is a generator 
idempotent relation as shown in Fig. 8c. The relations are extensions of the relations of 
the Jones monoids. In [61] two scenarios of relations and graphical representations of 
the elements of the corresponding monoids are given. For each scenario, the number 
of all possible structures is provided through the number of equivalence classes 
of words (or elements in the corresponding origami monoid). Also a polynomial 
time algorithm exists that computes the shortest word for each equivalence class. A 
connection between the Green’s relations of an origami monoid and those of a direct 
product of Jones monoids [62] is given in [60]. In particular, it was shown that an 
epimorphism p : On —> Jn X Jn induces the bijective correspondence on Green’s 
classes. 

The definition of origami monoids is motivated by the Jones monoids, and the 
origami structure, concatenating two types of strands (scaffolds and staples), implies 
two types of generators with similar relations. This construction of doubling gen- 
erators and imposing substitution relations can be generalized to other algebraic 
structures, or it can be generalized with more than two types of generators. These are 
algebraic problems directly arising from the experimental design of DNA origami. 


5 DNA Origami and Origami Knots 


DNA origami assembly can be confounded by knotting in the scaffolding strand. For 
example, in a preliminary experiment, an essentially planar target did not form well 
when a simply knotted (trefoil) scaffold was used [63]. Thus, tools are needed to avoid 
inadvertently knotted routes when designing the increasingly sophisticated targets of 
DNA origami. On the other hand, Seeman has shown it is possible to engineer single- 
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Fig.9 aAcomplete graph with seven vertices, K7, with three Hamiltonian cycles colored distinctly. 
b An embedding of K7 on a torus. The edges of the yellow and green cycles go around the torus 
handle. Both those cycles are knotted 


stranded DNA with specified knotted topologies [64], and this capacity for controlled 
knotting may be used intentionally to design better (higher yield, larger, smaller, 
more symmetric, more robust, more topologically complex, etc.) nanostructures by 
deliberately exploiting the topology of the knotted scaffold. 

When routing a single scaffold through a graph-like target structure, a typical 
design constraint is avoiding self-crossings [24, 33]. If the target structure is modeled 
by an Eulerian graph cellularly embedded on an oriented surface in 3-space, then 
some of these routes correspond to A-trails, which are Eulerian circuits that turn 
either ‘left’ or ‘right’ at each vertex. If the surface is a sphere, then all A-trails in 
the target structure are necessarily unknotted, but for higher-genus surfaces there are 
settings in which every A-trail is knotted [26, 27]. The complete graph on seven 
vertices, K7, is shown in Fig.9a, and an embedding of K7 on a torus is shown in 
Fig. 9b. Every vertex of K7 has even valency (i.e., valency 6) and hence, the graph is 
Eulerian. Three distinct Hamiltonian cycles are depicted, purple, green and yellow. 
The embeddings of the yellow and the green cycles are knotted. 

Since standard DNA origami scaffold strands are unknotted, the problems arise 
of determining whether there exists an unknotted route for a scaffolding strand in 
a given geometrically (or surface) embedded target graphs, and of characterizing 
graphs which have unknotted routes. Once again, a new area of mathematics emerges, 
as determining knotted and unknotted routing trails is fundamentally different from 
previously studied knots and links in graphs. Prior work focused on intrinsically 
knotted and linked cycles in graphs (see, for example, [65—67]), appearing as cycles 
in the graph. However, for this application, the knots are Eulerian circuits rather than 
cycles. For example, Conway and Gordon [67] showed that every embedding of K7 
in R? has at least one knotted cycle. Figure 9 shows an embedding of K7 on the torus. 
By Conway and Gordon [67], it contains at least one knotted Hamilton cycle (the 
green cycle, for example). 

However, in contrast, [24, 26] have shown that every A-trail in the embedding of 
K7 in the torus is unknotted. This follows since the embedding of K7 in the torus 
is checkerboard colorable, and hence every A-trail bounds the union of the black 
regions, and hence a disk. It is consequently unknotted (Fig. 10). Moreover, in [24] 
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(b) 


Fig. 10 a A checker-board coloring of the embedded K7, b A-trail transitions at a vertex of the 
embedding joining the shaded regions of the embedded K7. These A-trails bound a disk consisting 
of the shaded regions on the torus and therefore they outline K7 with an unknotted routing 


it is shown that every Eulerian circuit is knotted on a torus if and only if there is a 
non-checkerboard colorable embedding of the graph. 

For geometrically embedded graphs that are not necessarily embedded on some 
surface, and hence for which A-trails are not necessarily defined, the initial chal- 
lenge is formalizing the constraints on how the scaffolding strand may pass through 
vertices. 

A-trails can be generalized to O-trails, which permits analyzing the knotting of 
strand routes in more general geometric embeddings of graphs in space, or embedding 
on a surface where ‘non-crossing’ smoothings at the vertices are defined [24]. For 
example, Fig. 11 shows Ks embedded as a tetrahedron with a body-center vertex, 
together with an O-trail through it. 


Fig. 11 An O-trail ina 
geometric embedding of K5 


NF 


O-trails are Eulerian circuits in geometrically embedded Eulerian graphs so that 
at each vertex the edge pairings determined by the circuit are all non-crossing (i.e., 
there is a topological disk containing the half edges about the vertex in which both 
the turnings determined by the circuit and the orthogonal turnings are non-crossing). 
For A-trails in a surface mesh, the disk is just a small neighborhood of the vertex, so 
A-trails are a special subclass of O-trails. 

The DNA origami application needs to understand and control the behavior of 
O-trails, both knotted and unknotted, in fixed geometric graphs, as well as over all 
possible embeddings of an abstract graph. Finding O-trails is NP-complete in general, 
since finding A-trails is, even for plane graphs, and the problem of determining if a 
knot is unknotted is NP-hard. 
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Both unknotted O-trails and constrained knot embeddings are entirely new direc- 
tions in knot theory. There has been extensive prior work on knots in graphs, but 
this focused on intrinsically linked cycles or knotted Hamiltonian cycles [66-68]. 
Here however, the DNA origami constraints lead to questions of knots in Eulerian 
circuits, which opens an analogous and equally rich line of inquiry, but now in a 
completely new setting. Although embedding a given knot on a standard surface is 
known, considering geometric knot invariants under requirements imposed by DNA 
conformation is novel. Furthermore, characterizing graphs with unknotted O-trails 
will help identify target structures amenable to origami self-assembly from unknotted 
scaffolding strands and complements current experimentation [15]. In most applica- 
tions a specific geometric embedding of the target is sought, so a pragmatic goal is to 
characterize classes of embedded graphs for which determining unknotted O-trails 
is computationally tractable. 


6 Where Next? 


The mathematical models discussed here generally follow what is becoming a com- 
mon pattern. The first step is always developing a theoretical formalism that simul- 
taneously captures the essence of a design problem, while providing a foundation 
for the following theoretical work. The second step is developing preliminary results 
for specific experiments. 

When moving from specific experiments to seeking general strategies, e.g., via 
fast computer algorithms, particular design problems often can be prohibitively dif- 
ficult. Thus the third step frequently is providing fast algorithms or proving that fast 
algorithms for general solutions might not be possible, i.e., are NP-hard. 

Discovering that a DNA self-assembly problem is NP-hard is exciting theoreti- 
cally, because it immediately opens a plethora of related computational problems. 
These include seeking approximation algorithms, finding optimal solutions for par- 
ticular families of problems, and devising pragmatic approaches for urgently needed 
special cases. 

Although an NP-hardness result does not help a lab trying to conduct its next 
experiment, it does prevent wasted effort seeking general strategies. Furthermore, 
a specific NP-hard problem may be reduced to another known NP-hard problem, 
for example, the Traveling Salesman Problem (TSP), but for which there already 
exist robust tools such as fast approximation algorithms and algorithms optimized 
for special cases. These then may be adapted to the self-assembly problem, as with 
the TSP reduction for strand routing in [27]. 

DNA self-assembly is now a rich source of theoretical problems. These problems 
and their solutions advance both the mathematics and the self-assembly processes. 
The lab constraints lead to new mathematics and often breakthrough new directions, 
as we point out here. The mathematical foundations supporting innovations in sci- 
ence and industry are often seen years after they are developed, and are sometimes 
unacknowledged, because by the time the theory has been applied, it has entered 
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the mainstream of engineering. This emergent mathematical theory, this new ‘DNA 
mathematics’, lays the foundations that can support the further growth of DNA self- 
assembly technologies, and whose potential applications, although inspired by DNA 
self-assembly, may still be yet to come. 
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Abstract Regardless of the scale it is built at, the geometric principles of origami 
make it an architecture both constrained by the act of folding and imbued with 
dynamics by folding. From space stations to molecules, origami solutions are 
inspiring a new type of design that considers its deployment as an integral part 
of the structure. In this essay, I will explain some surprising similarities between the 
ancient art of folding paper origami and the folding of RNA at the nanoscale. 


The invention of folded paper is likely to have arrived very shortly after the invention 
of paper. The simple act of folding the sheet, something so seemingly obvious you 
might not think to give it a name, began to organically grow into the elegant art form 
we today call ‘Origami’. Inspired by over a thousand years of tradition and innovation 
in paper folding, modern scientists and engineers have gained a new appreciation for 
origami design [1]—as folding solutions in paper seem to translate upward to the 
scale of buildings [2] and downward to the scale of molecules [3]. An origami begins 
by developing a pattern of creases in a sheet of paper; the folds allow the flat sheet 
of paper to be reconfigured into different geometric forms. The basic act of folding 
is one of the transformations: The mere placement of creases on a flat sheet allows 
for nearly any imaginable three-dimensional animal or plant or abstract form to be 
created—an act almost as magical as life springing out from a seed. 

Folding and packing are the central aspects of paper origami, but are the rules of 
origami universal to all folded things? Much like origami, folding and packing are 
both extraordinarily important for the stabilization of structural RNA molecules, 
which play essential roles in regulation, chromosome maintenance and protein 
biosynthesis [4]. Are there lessons to be learned from origami that can help us to 
better navigate the problems of biopolymer folding in a tactile way? 

Similar to how living animals develop from the confines of a single cell that 
divides, an origami model begins as creases within the bounds of a sheet of paper. 
Origami seems to be an ideal medium for mimicking the shapes and forms found 
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in nature. Origami expert Jun Maekawa points out that in many traditional origami 
models, the colored top surface becomes the outside and the bottom of the sheet 
becomes the inside, and so as a consequence, the origami has the same topology as 
a blastula [5]. Modern origami techniques are now even able to mimic the stripes 
of zebras [6] and spots on beetles [7] by using both sides of the paper strategically. 
Nature perfected folding before humans invented origami, and examples can be 
observed all over nature; in the unfurling petals of flowers, the unfolding of insect 
wings, or the way that some plants dry out and coil up to spread their pollen or seeds. 
In all of these cases, the pattern of folds provides major evolutionary advantages to 
the organism. The ability to contract and expand (or deploy into a larger shape) is the 
mechanism of motion for all living things, even at a molecular level. From the way 
that our genes are densely packed into a fractal-like network of coiled coils, to how 
protein expression is dynamically turned on and off just when needed, deployable 
structures are part of the mechanism of life. The art of origami is indeed a beautiful 
representation of this concept. 

Some of the earliest innovations in the art of paper folding originate from Japan. 
Traditionally folded from very high-quality paper, it was mostly used for ceremo- 
nial purposes, at weddings or as decorations on gifts [8]. It was not until around the 
early 1900s in Japan that books formalizing the art were published, and the name 
‘Origami’ became associated with paper folding. The Japanese paper crane and vari- 
ants, along with a handful of traditional models from other cultures, such as the 
paper balloon and pinwheel fold, represented the total knowledge of paper forms at 
the time. Then, in the 1950s, Akira Yoshizawa entered the scene and both revolution- 
ized and popularized origami when he published “Atarashii Origami Geijutsu’ (New 
Origami Art), introducing many innovative new origami models and techniques for 
shaping origami, and perhaps even more importantly, standardized the diagrams used 
to depict the folding process [9]. The seeds were planted, and the ingenuity of the art 
form was accelerated from there. As the complexity of creasing patterns increased, 
the intricacy of the resulting forms began to gain origami international recognition 
as a new medium for sculpture. By the 1980s, there were thousands of published 
origami designs. 

The formalization of axioms describing the geometry of origami [10] and contri- 
butions from innovators in origami design, such as Maekawa’s system for designing 
complex origami folds [5], inspired the creation of computer algorithms [11] that are 
able to automate much of the design of crease patterns for origami. The most notable 
of these algorithms, TreeMaker [12], written by Robert Lang in 1996, was the first to 
enable the design of origami with an arbitrary number of flaps of any length, which 
become the arms and legs of the origami (see Fig. 1a). TreeMaker led to a ‘Cambrian 
explosion’ of new complexity in origami folding among origami designers as they 
raced to generate astonishingly intricate models with ever more flaps with which to 
sculpt extra details like additional legs, claws, feathers and scales. 

After many contributions from mathematicians, physicists and engineers [1], 
origami has become a highly multidisciplinary area of science. There are endless 
practical applications for folding, as origami can be found in many places where 
size and deployability are important, such as heart stents, airbags, or folding solar 
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Fig. 1 Modularity in origami. a an origami ‘molecule’ fold based on the isosceles right triangle, 
one of the simplest folds that defines a flap. The lines show the position of folds, and the blue circle 
slice represents the area of the paper that contributes to the flap (right). b the four traditional bases of 
Japanese origami: the kite base, the fish base, the bird base and the frog base. Each of the bases tiles 
a different number of units of the isosceles triangle molecule shown in a, either 2, 4, 8 or 16. The 
blue areas show the part of the paper that will make a flap. c the folded form of each of the origami 
bases with 1, 2, 4 or 5 flaps, respectively. d an example origami folded from each base. Designs 
with greater number of flaps achieve much higher compaction in the final model. e an example of a 
modular folding pattern. The waterbomb base can be tiled such that each row is offset to the middle 
of its neighbors above and below. Figures a—d adapted from [13] 


panels. Here, I will share a story about folding, how origami is a physical analogy for 
biological folding landscapes, and why multistability is both important and inherent 
in folded systems. 


1 Origami Molecules 


Origami is an art form that continually pushes the limits of our understanding of the 
mechanics of folding. As artists strive to pack more detail and information into the 
creases confined to the bounds of a sheet of paper, designs can become so intricate 
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that they take many hours to fold by hand. Origami models are often folded in two 
main stages: Initially an origami base is made, a large interconnected fold using the 
entire paper that creates a number of flaps (Fig. 1b, c). Next, the flaps that were 
created are sculpted into details such as legs or wings and other features (Fig. 1d). 
The base functions as a scaffold to organize flaps, and the flaps are used to create all 
the details that define the model. 

Surprisingly, despite how diverse the variety of origami has become, most origami 
features the repeated usage of key folds, which are called ‘molecules’ in the Japanese 
tradition [13]. Molecule folds, of which there are many varieties, turn out to be 
especially useful because of their modularity within origami designs. One of the 
most recurrent molecule folds of origami, shown in Fig. la, is an isosceles triangle 
that is bisected by a crease at 22.5°, defining a single flap. Two copies of the molecule 
fold can be fit into a square (Fig. 1b, leftmost) to produce a kite base, the simplest of 
the four classic bases of origami: kite, fish, bird and frog. The incredible practicality 
of this fold is that, due to its shape, it can be scaled successively smaller to generate 
all of the traditional Japanese origami base folds (Fig. 1b), and this pattern can even 
be extended to larger designs [13]. 

Starting from the same molecule fold (Fig. la), each of the traditional origami 
bases produces different numbers of flaps (Fig. 1c), and the area of paper used for 
each flap is highlighted in blue on the crease pattern (Fig. 1b). The folded base can be 
further sculpted into familiar origami forms by adding additional detail folds to the 
flaps (Fig. 1d). The resulting origamis become smaller with respect to the paper as 
more flaps and details are added. To compensate for this fact, origami artists often use 
very large paper up to 50 cm square or larger, to produce the most complex models 
[14]. In the process of exploring how to subdivide the paper to maximize the number 
and size of flaps, artists discovered and documented many new rules for design: For 
example, common origami base folds, such as the waterbomb fold (Fig. le, left), 
can be tiled into larger and more complex folds (Fig. le, right). The concept of fold 
modularity in origami was pushed to new limits when several innovative origami 
designers at the time began devising new origami by combining molecule folds in 
new patterns [13]. The modularity of folding origami allows for common folds to be 
merged in surprising ways. John Montroll used this concept to elegantly create a bird 
base with a fifth extra flap [15], and Jun Maekawa created the elongated features of 
a crocodile [5] by merging two different traditional base folds. 

Around the same time that the field of origami was having a breakthrough with 
respect to modular design principles, scientists around the world were just beginning 
to experiment with modular strands of DNA produced by new solid-phase synthesis 
technology—initiating the fields of DNA and RNA nanotechnology. The lab of 
Nadrian Seeman was investigating ways to program DNA strands to assemble into 
stable three-dimensional configurations, such as a cube [16]; and in this pursuit, they 
designed a new fold of DNA that would become the ‘origami molecule’ for nanotech- 
nology—the double crossover junction [17]. Just as early progress in origami design 
was made by creating variations elaborated from one of a few different origami base 
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folds, the field of nanotechnology grew by building larger and more complex struc- 
tures using the double crossover as a structural strut [18] and by folding genomes 
into repeating arrays of double crossovers that we call scaffolded DNA origami [3]. 

In parallel with developments in DNA nanotechnology, a theory of structural 
modularity for RNA molecules was being developed [19]. Westhof, Masquida and 
Jaeger hypothesized that RNA is modular and that new structures can be composed 
by rearranging and merging fragments of existing structures. This theory would 
lay the groundwork for developing a modular design method for constructing RNA 
structures [20]. RNA is the working component behind many biologically important 
molecular machines, responsible for protein synthesis and information processing in 
all living cells. Because the RNA machines of life have a modular construction that 
we can learn to use, there are many possibilities for the rational design of RNA by 
recombining structural modules. 

Early work exploring the modularity of RNA tertiary interactions demonstrated 
that modular design could be applied to create programmable nanoscale assembly 
units of RNA [21]. Modular RNA structures self-assemble into defined shapes guided 
by the incorporation of different aptamers, junctions and programmable connectors 
embedded within their sequence. The full versatility of structural modularity for RNA 
initially proposed by Jaeger and colleagues [19, 20] was demonstrated by creating 
a building system of modular RNA assembly units that link up to form defined 
assemblies and repeating lattices (Fig. 2a) [22]. By merging different structural 
modules corresponding to bends, junctions and connectors, arbitrary and highly- 
detailed designs such as nano-hearts were produced (Fig. 2b). RNA architectures 
can be rationally mapped to a single-stranded path, encoded into a gene, and then 
expressed as self-folded shapes [22]. Although in RNA the folds are encoded into 
sequences of nucleotides, rather than formed by creases in origami paper, it is the 
same property of modularity of folds that makes it possible to fold paper into origami 
animals and compose RNA structures that self-assemble (or fold) into nanoscale 
shapes. 

Drawing inspiration from the incredible structures of DNA origami [3] built using 
the double-crossover motif [17], I developed an RNA version of the motif that has 
a single-stranded topology [23]. In DNA origami, numerous short staple-strands 
help a long scaffold strand to fold into a compact form. Now, in RNA origami, 
programmable kissing loops (KLs) define one of the edges of the double-crossover 
motif and consequently enables designs based on this pattern to be produced cotran- 
scriptionally (Fig. 2c). KL interactions span the space of a normal helix, but function 
to both connect and coaxially align distant helices. The single-stranded crossover 
fold, which I will call a KL crossover, is the equivalent of an origami ‘molecule’ 
fold for RNA and forms a compact modular unit that provides an underlying struc- 
ture for numerous helices that can function as ‘flaps’ (Fig. 2c, d). Each unit of the 
KL crossover fold creates a geometric closure via the formation of a kissing loop 
[24], adding a growing element of stability to the origami, especially when coopera- 
tively forming many KLs in alignment. As a scaffold material, RNA origami designs 
comprise a core of multiple KLs that are flanked by hairpins (Fig. 2c, e). These helix 
‘flaps’ can be functionalized with a toolbox of programmable connectors, protein 
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Fig. 2. Modularity of RNA origami. a programmable RNA lattices formed from square and 
triangle RNA units. b self-assembling RNA hearts from multiple units (top) or produced cotran- 
scriptionally from a single strand (bottom). ¢ the ‘origami molecule’ fold of RNA origami, a kissing 
loop interaction stabilizes the formation of a double-crossover section (inset and yellow box). On 
the edge of the structure are terminal hairpins, analogs to ‘flaps’ of origami, that can be functional- 
ized with a variety of different RNA structural motifs such as programmable kissing loops, protein 
binding sites and fluorescent light-up aptamers such as Spinach and Mango. d larger and more 
complex RNA structures can be built by merging together copies of the fold in a. RNA origami 
structures range from 300 to 2300nts in scale. e functionalization of an RNA origami scaffold with 
binding sites for proteins fused to CFP and YFP. Adjacent binding sites bring the CFP and YFP 
proteins close enough to produce FRET. Figures a—b adapted from reference [22]. Figures c—e 
adapted from reference [25] 
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binding sites, or fluorescent light-up aptamers such as Spinach or Mango (Fig. 2c, 
inset). 

RNA origami structures containing two, four, six or more copies of the KL 
crossover unit can be constructed because of the structural modularity, increasing 
design complexity through simple repetition [25] (Fig. 2d) similar to the molecule 
folds of paper origami (Fig. 1b, e). Analogous to the way that flaps are developed 
with detail folds in an origami model (Fig. 1d), the free arms of the RNA origami 
structure can be embedded with RNA motifs that ultimately decide the functionality 
of the structure. For demonstration, RNA origami was produced containing binding 
sites for two fluorescent protein complexes that can interact to produce FRET, a type 
of resonant energy transfer that can only occur when fluorophores are within a few 
nanometers of each other (Fig. 2e). Validating that origami scaffolds can colocalize 
and position proteins with precision, scaffolds created with adjacent binding sites 
produced a much brighter FRET response than scaffolds designed with far-apart 
binding sites or no binding sites at all [25] (Fig. 2e). In a paper origami, the folds 
of the base are all interconnected, while the folds of details are local and can be 
adjusted without compromising the base form. Similarly, RNA origami architecture 
uses a repeating modular pattern of interconnected local folds to form a stable core 
structure, out from which project helical arms can be sculpted with details from 
natural biological folds to guide protein binding, position aptamers, or other RNA 
active sites with precision. 


2 Origami Design Algorithms 


The culmination of years of exploration in the modularity of origami folds by 
numerous experts led to a mathematical formalization of paper origami and folding 
[10], systematic new approaches to origami design [5] and framework for creating 
arbitrarily complex folds [11]. These methods introduced a new school of thought 
to the origami world, designs became abstracted as tree-like stick figure diagrams, 
and the placement of origami flaps became distilled into a computational optimiza- 
tion problem. In particular, the TreeMaker [12] algorithm was very influential in the 
origami world because it enabled a new level of control over the number and relative 
size of every flap in a design. With this software, users can computationally generate 
creasing patterns for a wide variety of new origami. 

Origami design using TreeMaker begins by studying the subject to represent in 
origami, in this example, a lizard (Fig. 3a). First, the lizard is abstracted as a weighted 
tree graph (Fig. 3b); each leaf on the tree will be folded into a separate flap, and 
the weights reflect the relative lengths of those body segments. Circles with radii 
weighted by the tree are fit, so their centers are within the bounds of the paper and 
then scaled so that their perimeters all touch but do not overlap (Fig. 3c)—each circle 
represents where a flap will be created. The distance between a center of a circle and 
that of another matches the weight of the path between the corresponding leaves on 
the edge-weighted tree. In this example, those weights happen to all be length one, 
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and so, for example, the distance between a front leg and a back leg is three (Fig. 3c). 
The blue ‘river’ area of the design that does not contain any circles functions like a 
flap that is connected at both ends and will be what the body of the design is folded 
from (Fig. 3c—e). Next, the algorithm fills in the polygons with molecule fold patterns 
(Fig. 3d). The ‘rabbit ear’ molecule is a general fold solution for a triangular space, 
and more complex molecule patterns exist that allow any polygon to be filled with 
creases [26]. During the folding compaction, the creases will function as hinges, and 
the polygons of paper between the creases are called ‘facets’. As the fold is collapsed, 
all of the facets move in concert (Fig. 3e), and the cross-section of the fully collapsed 
base fold has the same shape as the tree (Fig. 3b). Once the origami is folded, the 
addition of detail folds [13] and other well-established shaping techniques, such as 
wet folding [9], can be applied to further refine the origami so that it resembles a 
lizard (Fig. 3e). 

While for paper origami, algorithms simplify the work to produce complicated 
models by automating the design of folds starting from a tree diagram; in RNA, 
algorithms are used instead to create and optimize a sequence that will fold into 
the desired tree structure. An RNA can be thought of as a kind of one-dimensional 
origami that is deployed by the act of transcription; the leaves and branches of the 
tree structure and its folding pathway are encoded into the sequence of the RNA 
strand. Be it creating a pattern of paper creases that define a shape [27] or a pattern 
of nucleotides that define a fold [28], design is difficult—and the time to compute 
a design increases rapidly with the complexity of the design. Once the design is 
completed, it can even take an origami master considerable time and patience to fold 
and collapse a complex crease pattern. In comparison, an RNA origami transcript that 
is designed well will fold all by itself! However, the key phrase is ‘designed well’, 
and because of the complexity of the inverse folding problem, design optimization 
by computer algorithms is absolutely essential. 

Recently, I developed the RNA origami design software ROAD [25], which 
enables researchers to model, optimize and encode large and complex RNA folds 
into a single-stranded sequence (Fig. 2d), allowing RNA origami nanostructures that 
are kilobases in scale to be produced. ROAD, in a similar manner to TreeMaker, is 
an iterative optimization algorithm that attempts to design origami structures based 
on an inputted tree. However, rather than design crease patterns, ROAD designs a 
sequence of nucleotides that encodes folds in the RNA strand. The problem of inverse 
folding by sequence design is approached by guessing an initial sequence and then 
repeatedly revising that design to improve its score based on a number of heuristic 
tests. This approach relies on repeatedly evaluating guesses in an energy model that 
simulates RNA folding and benefits from fast algorithms such as ViennaRNA for 
reliably computing RNA folds [29]. However, pseudoknots, the interactions that 
form between distal single-stranded regions of a structure, are notoriously difficult 
to compute—spelling a potential problem for designs that require numerous KL 
connectors. 

While many software already exist to design RNA, two features unique to ROAD 
make it ideal for producing RNA origami designs: First, it is able to rapidly assign 
origami KL sequences by systematically cross-checking all loops against each other, 
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Fig. 3. Origami design with TreeMaker. a a lizard, as inspiration for design. b a tree diagram 
abstraction of the lizard, where each branch has a length of one. ¢ to represent the tree in the bounds 
of the paper, each branch of the tree becomes abstracted as a circular area, and the body becomes 
the blue ‘river’ section, arranged to maximize the packing. Triangles on the graph connecting the 
nodes have lengths corresponding to the path length along the tree. d each polygon of the main 
crease pattern can be filled using an origami ‘molecule’ fold. Here, a general example for triangles 
is shown where creases bisecting each angle fold the triangle into three flaps. Each flap is colored 
according to the node it is assigned to in ¢. e sequence depicting the collapsing of the crease pattern 
into folded form with the same cross-section as the tree graph. With sculpting, the base fold can 
then be made to look like a lizard. Figure adapted from reference [12] 


whereas other software that are able to design pseudoknots do not scale to such 
large designs. Second, it designs sequences with the constraints of cotranscriptional 
synthesis as a main consideration, avoiding specific sequences and patterns known 
to interfere with transcription or the experimental workflow. While early modular 
RNA designs were produced by hand-aligning structural modules to model RNA 
particles, forexample, RNA squares created with exchangeable motifs for the corners 
[30] or tilings of small square and triangular assemblies (Fig. 2a) [22], the size and 
complexity of designs were ultimately very limited in scale compared to what can 
now be achieved with the computer-aided modeling and design offered by ROAD 
(Fig. 2d) [25]. 

The development of paper origami and DNA/RNA origami has both been greatly 
accelerated by computer-aided design algorithms. However, just as the crease patterns 
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generated by TreeMaker do not provide step-by-step folding instructions and do not 
guarantee that the compacted design will be an elegant one, RNA origami designs, 
even with computer-optimized sequences, still may not fold correctly if the order 
and staging of folding events are not planned well. Ambiguity of fold definitions, the 
topology of the strand/paper and the order of folding events are all factors determining 
the shape of the folding landscape for origami structures. 


3 Origami Folding Pathways 


Multistability occurs whenever a system has two or more stable equilibrium states 
or as a consequence of a folding pathway that gets stuck in non-equilibrium states. 
The presence of competing stable states is a common consequence of complex and 
dynamic systems, and origami is no exception. When folding a paper origami, certain 
combinations of mountain and valley folds produce bistability. A classic example 
of this is the waterbomb fold (Fig. 4a), which can transition between a triangular 
and square forms by inversion. The square form of the waterbomb is also commonly 
called the ‘Preliminary Fold’ [15] and is a starting fold for many origami models. 
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Fig. 4 Bistability in origami and RNA. a the waterbomb fold can collapse into two entirely 
different structures and is the most common example of bistability in origami. b a simple bistable 
RNA structure; it can alternate between either two short or one long hairpin. The energy barrier to 
transition between the structures can be as high as the unfolded state, indicated by arrow. Figure 
adapted from [31]. c the adenine riboswitch is a classic example of bistability. A color-coded arc 
representation of the competing secondary structures (left) shows how the terminator hairpin has 
bistability with stems P1 and P3. d schematic of adenosine sensing, depicting a window during 
transcription where the switch can activate based on ligand binding. If the aptamer is not stabilized 
in time, then the terminator stem forms and prevents the production of the rest of the RNA strand. 
Figures c—d adapted from reference [32] 
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The central vertex of the waterbomb fold is a ‘node’ that moves to actuate the 
inversion of the fold. As a node of the waterbomb is inverted, the fold becomes 
unstable at the transition point between the two states when the paper is fully flattened. 
This is because paper has non-zero thickness; creasing the paper sets a new preferred 
angle in the fibers of the paper along the fold lines, leading to the paper resisting 
being fully flattened. In the case of the waterbomb fold inversion, elastic deformation 
of the crease away from its preferred angle during the transition stores energy as the 
paper ‘pops’ from one fold to the other as you force the transition [33]. In other 
cases, such as for concentrically pleated folds, energy can be stored in the bending 
of non-triangular facets as the paper balances trying to be perfectly flat along the 
facets with the position and bend of the creases [34]. Bistability in origami is caused 
by competing forces between the creases wanting to bend and the facets of paper 
between the creases resisting bending. 

In RNA, an analogous bistable fold to the waterbomb would be two short hairpins 
that can transition into a single longer hairpin (Fig. 4b). Just as bistable folds can 
be found all over origami, bistable sequences are naturally prevalent in RNA and 
furthermore are so simple to design that a bistable sequence can be produced for 
any pair of two structures [31]. The tendency for RNA sequences to have multiple 
folded states of similar energy can present a real challenge for producing well-folded 
RNA structures. And likewise just as for origami, multistability and also topology 
are both major factors in how RNA sequences fold. Multistability is a property 
of sequences that is often used in important ways by functional RNA molecules. 
Because RNA is synthesized directionally from 5’ to 3’, kinetic control of folding 
in RNA can be used to direct folding down a specific folding pathway. For example, 
the adenine riboswitch (Fig. 4c) is the classic example of bistability and can be 
controlled by tipping the stability balance with ligand binding at an early point during 
the transcription of the sequence [32]. Similarly, the more recently characterized ZTP 
riboswitch navigates its folding landscape in a complicated manner, but its behavior 
is ultimately determined by ligand binding during a narrow window of time early 
in the transcription [35]. In a linear folding pathway, the ability to freely transition 
between bistable alternatives occurs during the narrow window of time when the 
strand is actively folding. 

In paper origami, the crease patterns will frequently have multistable elements 
with many bistable node points. In order for such a structure to be folded correctly, all 
of the nodes need to point the correct direction at the start of the collapse (Fig. 3e). 
Attempting an origami collapse with even a single misaligned node will lead to 
a conformationally trapped configuration that cannot flatten any further (Fig. 5a). 
This type of conformational trap is similar to the way that biopolymers can become 
kinetically trapped during their folding. Two basic properties in common between 
biopolymers and origami allow the problem of conformational trapping to arise: 
First, the paper or molecular strand cannot be stretched or compressed, only folded. 
Second, the paper or molecular strand cannot intersect itself in the folded form nor 
at any intermediate folding state [36]. 

To illustrate how conformational trapping can occur, consider the crease pattern 
for the paper crane base, one of the simplest variants of the waterbomb fold that 
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Fig. 5 Origami conformational traps. a a hypothetical energy landscape for a paper crane. Similar 
to the waterbomb fold, the bird base has a bistable landscape, although only one of the two folds 
can collapse fully and be made into a crane. Starting at the unfolded state, the central node can be 
displaced either up or down, leading to either the misfolded or collapsed state, respectively. The 
final folding of the model into a crane is represented by a large energy barrier. b a hypothetical 
energy landscape for the hairpin ribozyme junction, showing the magnesium-induced stabilization 
of the folded and collapsed states. Magnesium binding (red dots) overcomes electrostatic repulsion 
and allows the RNA to condense into tightly packed structure upon rearrangement. Alternative helix 
stacking order can produce partially stabilized structures that cannot fully compact. An interaction 
between the green and blue helices drives the final stabilization of the collapsed structure into the 
folded structure. c the order of folds can lead to bifurcation, for example, adjacent valley folds 
can lead to different stacking orders. Additionally, if either segment A or C is longer than B, then 
one of the two possible stacking orders becomes conformationally blocked. Figure ¢ adapted from 
reference [36] 
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also has a trapped state (Fig. 5a). The energetic landscape of a waterbomb fold has 
been measured using mechanical models [33], and for the hypothetical crane fold I 
illustrate something similar but with one of the two states having a lower energy than 
the other (Fig. 5a, misfolded vs. collapsed). The highest energy point corresponds 
to the unfolded state of the paper, and this is because it takes energy to elastically 
deform the creases from their preferred angle. The central node point or ‘flap tip’ 
of the waterbomb fold pattern can be pushed either upward or downward, releasing 
energy as the structure begins to compact. If it starts to compact toward the misfolded 
state, it will eventually reach a point where the structure cannot compact any further 
(Fig. 5a, left). In order to reach the correct folded state from the misfolded state, 
the entire sheet must first go back through the unfolded state (Fig. 5a, right), a 
common obstacle in the compaction of complex crease patterns for paper origami 
[37]. Finally, to create the fully developed crane model from the collapsed crane base 
requires additional deformations, and is represented by a significant energy barrier 
(Fig. 5a, right). 

Just like for paper origami, energy is required to unfold a misfolded RNA struc- 
ture before refolding it into a correct state, and for RNA, the barrier to transition 
between alternative secondary structures is quite high due to how strong base pairs 
are (Fig. 4b). Thus, if an RNA falls into a kinetic trap while it is forming base pairs, 
it can take a long time to refold those misfolded regions depending on the height 
of the energy barrier between alternative structures. For this reason, it is important 
to consider not only the final folded structure, but also all of the possible interme- 
diate structures, when evaluating if a particular RNA design is prone to be stuck in 
folding traps. Interestingly, it is possible at a theoretical level to intentionally design 
sequences with extremely long and winding low energy-barrier folding pathways 
[38]. This suggests that it might be possible to design RNAs that change shape over 
time or even perform complex computations by folding. 

Folding via a sequential folding pathway, for example, as a riboswitch folds 
(Fig. 4d), can be advantageous for controlling the direction of bifurcated nodes. 
Traditional origami provides a set of precise folding steps to produce the paper crane 
(Fig. 5a) that entirely avoids any chance of misfolding. By contrast, crease patterns 
produced in computational origami can have hundreds of folds that would ideally 
need to collapse in a concerted manner; however, the origami folder is limited by 
their two hands. Computational patterns are not generated with any folding sequence 
and often contain large irreducible folds that are interlinked and need to be collapsed 
simultaneously, and so sequential folding may not even be possible for many patterns. 
It is an important point that collapsing folds can produce origami that could not 
otherwise be produced in a stepwise manner; this was first documented by Akira 
Yoshizawa in 1959 when he innovated his incredible Cicada origami model, which 
collapses in a single motion from a tessellation of eight copies of the bird base fold 
[9]. 

Even with all the proper creases in place, it can take a good measure of skill 
to collapse a computational origami pattern in this manner. Expert origami folders 
often approach this problem by pushing each flap tip vertex in the proper direc- 
tion, bit by bit, over the entire pattern until it reaches a tipping point and is able 
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to compact [36]. Recent advances in the computer simulation of paper origami are 
able to estimate facet bending strain and model an idealized simultaneous collapse 
of all the creases in a design in a continuous motion [39]. The concerted collapse 
of a crease pattern echoes the annealing process that is often used to prepare DNA 
origami [3], whereby DNA strands are heated until they melt apart and then slowly 
cooled to let all interactions condense simultaneously. While DNA origami is typi- 
cally folded from hundreds of short staple strands that work together to force a long 
scaffold strand to fold into a three-dimensional structure, there are also DNA and 
RNA origami structures that can collapse from a single long strand by annealing 
over a long temperature ramp [23, 40]. By contrast, the more biologically relevant 
cotranscriptional preparation of RNA origami is comparable to the stepwise folding 
instructions of traditional origami. Interestingly, the folding space for heat-annealed 
RNAs and cotranscriptionally folded RNAs does appear to be quite different [23], 
with annealing having the ability to produce some structures that kinetic folding was 
not able to, much like what was observed to be the case for paper origami folding by 
Yoshizawa [9]. 

In addition to multistability in base pairing, such as is utilized to channel the 
folding in riboswitches, RNA structures also frequently encounter multistability in 
the collapse of multi-helix junctions. This is because the problem of folding is not 
just specifying the helices that fold, but also how the helices should be oriented with 
respect to one another. The helix packing stage of RNA folding can take substantially 
longer than the initial folding [41], especially for larger structures with multiple 
junctions that need to rearrange. This type of structural rearrangement is depicted in 
Fig. 5b, showing the folding of the hairpin ribozyme [42]. In the hairpin ribozyme, the 
blue and orange helices can coaxially stack different ways, but only one conformation 
is further stabilized by an interaction with the green helix (Fig. 5b). Compared to 
multistability in base pairing (Fig. 4b), the energy barrier to move between alternate 
packing conformations is much lower (Fig. 5b), meaning that RNAs can explore this 
structural space more freely during folding. During the folding of RNA, magnesium 
counterions bridge and neutralize the negative charges between helix backbones, 
triggering a collapse into more compact structures [43]. Stabilization by binding 
metal ions has the effect of deepening the free energy curve for the fully folded 
structure over misfolded structures, in effect channeling folds to the correct structure 
given enough time [44]. 

Deepening the energy well for the desired fold can be used to rationally design 
RNA structures for which junctions fold rapidly and stably into the correct stacking 
conformation. The strategy of forming geometric closures through long-range inter- 
actions is widespread in biology [24] and is an effective way to stabilize multi- 
helix junctions. The hairpin ribozyme uses a long-range interaction to specify the 
antiparallel orientation of the junction [45], and likewise, other arrangements can 
be programmed based on the context of tertiary interactions. For example, embed- 
ding a loop—teceptor interaction into two stems of a four-way junction can strongly 
favor a parallel-helix stacking orientation that aligns the loop and receptor [46]. In 
RNA origami structures, the same strategy is used, and both parallel and antiparallel 
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arrangements can be specified by choosing the proper length of the KL-crossover 
[23]. 

In paper origami, the order in which the folds are made can lead to different 
stacking orders of the paper layers (Fig. 5b). For a given crease pattern, deter- 
mining a valid stacking order is computationally involved because of how inter- 
dependent the folds are [36]. Consider the simple placement of just two adjacent 
valley folds (Fig. 5c), if either segment A or C is longer than B, then one of the two 
possible stacking orders becomes conformationally blocked. The context dependence 
of folding makes it difficult to predict how facets in a crease pattern may interact, as 
what happens at the crease between segments B and C depends on how segment A 
was folded. In larger origami designs, these dependencies can be far more complex, 
often linking folds on opposite ends of a sheet [36]. 

Not surprisingly, context dependence is a major factor in RNA folding as well; just 
as the folding of paper can block later creases from folding, the folding of strands into 
pseudoknots during transcription can create folding barriers [47]. Locally, the energy 
of each base pair is dependent on its neighbors, and globally, the final compaction 
of helices into three-dimensional structure is driven by the stacking of helices at 
junctions and by the cooperativity of numerous weaker packing interactions [48]. 
The order in which the folds occur within an RNA origami can have a large influence 
on the outcome. As an RNA strand folds cotranscriptionally, the rate of folding into 
helices is many times faster than the rate of strand synthesis; long-range interactions 
such as KLs form comparatively slowly, giving helices and junctions time to stabi- 
lize before they lock into place. This hierarchy in rates makes it possible to design 
arbitrarily mazelike strand paths that arrange perfectly when KLs link up. Since each 
KL contributes roughly ~10 kcal/mol of binding energy in typical folding conditions, 
the total energetic contribution from many KLs can be substantial. 

RNA origami is held together by numerous KLs working in concert, and the 
order in which the loops pair up in an RNA origami can lead to the formation of 
structural barriers. ROAD design software attempts to predict structural barriers by 
analyzing the position of KLs relative to the rest of the structure to see if any unpaired 
helices are nested within [25]. It is however up to the designer to refine their folding 
path based on the feedback. A recently developed program designed to optimize 
tree structures for cotranscriptional folding can help with this process by proposing 
structural variations that may have fewer barriers [49]. 

To illustrate the challenge of designing for sequential folding, consider a complex 
RNA tree (Fig. 6a) with branch lengths designed to fill the space of a rectangle 
(Fig. 6b). If we trace a path outlining the tree, it will make a representation of its 
secondary structure (Fig. 6c). Depending on where the 5’ and 3’ ends are placed on 
this circular path, different strand paths can be generated that represent the same 
fold—here, two examples are illustrated (Path! and Path2, Fig. 6c). In the first 
example (Pathl, Fig. 6d), the polymerase produces a long and winding structure 
that folds into a rectangular tile, condensing mostly at the end as the last KLs are put 
into place. At every intermediate position of the folding, there are no issues with the 
strand intersecting itself. In a second folding example (Path2, Fig. 6e), the location 
of the 5’ end is moved such that several KL pairs are made very early in the fold. 
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When the KLs lock into place in this example, it creates several looped-out portions 
of single strand that still need to pair (Fig. 6e, orange strands). Since the RNA forms a 
helical turn every 11 base pairs, when the complementary portion of any of the longer 
orange sections is produced, it is going to encounter a structural barrier because the 
strands cannot intersect. As a result, some KLs will have to unpair before the red- 
colored portion can fold correctly (Fig. 6e, red strands). Cotranscriptional folding 
experiments comparing these two designs found that the Path! design resulted in a 
considerably higher yield of correctly folded products compared to the Path2 design 
(Fig. 6de, right) [25]. While both Path! and Path2 produced folded products, cotran- 
scriptional folding via Path] produced a much more homogeneous product at a higher 
yield. Furthermore, only Path! produced TEM data of high enough quality to create 
an ab initio reconstruction of the RNA origami (Fig. 6d, right). 

Misfolding appears to be a natural consequence of producing more complex folds. 
As more layers and more interactions between folds are added to a structure, the 
chance to create bifurcations during folding increases. Just as the wrong order of 
folds in a paper origami can lead to a conformation that blocks further folding 
(Fig. 5), the wrong order of synthesis through a single-stranded design can also 
lead to the formation of roadblocks to folding (Fig. 6e). The directional synthesis of 
RNA is one way to solve the problem of folding bifurcations, both with respect to 
the order of condensing helices, but also when alternative structures with equivalent 
or similar energies are present (Fig. 4b). Indeed, in the classic example of bistability, 
riboswitches take advantage of the slow and directional synthesis to control their own 
folding landscapes (Fig. 4d) [32]. Likewise, natural RNAs have evolved clever ways 
to cope with conformational multistability, as the hairpin ribozyme uses long-range 
interactions to lock the correct conformation of a junction into place [42]. Lastly, 
having a strand path that avoids getting tangled while it folds can result in higher 
yields and more homogeneous folded RNAs (Fig. 6d) [25]. 


4 Folded Origins 


The word ‘Origami’ derives from two Japanese words: ‘ori’ meaning to fold and 
‘kami’ meaning paper. Rather than describing the final folded product, the name 
origami refers to the active process by which it is produced. United by the idea 
of growth from folds, paper folding and biomolecular folding are both ultimately 
expressions of life. The two types of folding are governed by two limitations that 
are at the core of the concept of folding in general: The first is the property of 
non-intersection (injectivity) and the second is the property of non-stretching and 
non-compression of the material (isometry) [36]. Out of these two properties, almost 
all of the other features of folding emerge as a consequence. 

Paper folding provides a tactile and intuitive way to understand the concept of 
biological folding landscapes, especially conceptually difficult ideas like folding 
traps. Paper is a versatile medium, it can be cut and glued, it has memory in the form 
of creases, and it can be dampened and reshaped [13]. It turns out that nucleic acids 
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Fig. 6 RNA folding pathways. a a tree structure describing the secondary structure of an RNA 
origami. b the same tree, redrawn according to interactions between leaves on the tree. c the tree 
redrawn as a circular strand path, and aligning interacting helices. Two candidate positions (Path1 
and Path2) place the 5’-end starting point at different points in the strand, arrow indicates strand 
direction. d Path1, beginning at the right edge, has a long circuitous path. An illustration of hypo- 
thetical folding intermediates is shown. A model of the folded structure is shown on the right, 
along with an ab initio structure produced by analyzing TEM micrographs. e Path?2, starting in the 
middle of the design, forms a stable core structure with many long loops (highlighted in orange). As 
the structure folds, it can reach a midpoint where the continued folding is blocked (shown in red) 
because the growing chain needs to wrap around a trapped region (orange) to pair with it. Yields of 
Path2 were lower than for Path1, and the TEM data for Path2 were not high quality enough for an 
ab inito reconstruction. Figure adapted from reference [25] 


have many of the same properties: although ribozymes and ligases do the cutting and 
gluing, base pairing and magnesium-induced stabilization provide the memory and 
glue—achieving effectively a similar result. In this way, DNA and RNA origamis 
become a natural extension of an art form that has thrived for over a thousand years. 

In both paper origami and RNA origami, we encounter numerous types of bifur- 
cations in folding pathways and solve them in similar ways. From multistable helices 
in riboswitches to the many conformers of helical junctions, the problem of folding 
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always requires consideration of its deployment when building structure. In the final 
vision of any architecture, there are often many unseen layers of logistics and orga- 
nization that are critical to enable the construction. Perhaps, this is the most elegant 
aspect of it all, that there is a hidden beauty to folded architecture that is concealed 
by the dynamic process of its folding. 

In spite of how well the analogy between origami and molecular folding works, 
there are also some interesting areas of contrast to mention: In origami, even ‘thick’ 
origami, the paper is usually much larger in dimensions compared to the thickness 
of the paper, so much so that the thickness of origami is easily idealized as zero- 
thickness in simulations. For RNA origami, a corresponding analogy might be to 
imagine the strand having nearly zero-thickness as well. However, when we zoom 
down to the scale of molecules, the thickness of the RNA helix and the length of 
the smallest helical features that can be designed are about the same, ~2 nm. As a 
result, RNA designs that have more structural features need to be made considerably 
larger than structures with fewer features—unlike paper which can be folded both 
more densely and smaller to achieve more features. Also, and particularly a contrast, 
when a biopolymer strand condenses into a misfolded state, it takes much longer to 
unfold than it does to misfold in the first place—with paper origami, it is quite the 
opposite and much more difficult to fold than it is to unfold! 

The art of origami is an exploration of possibilities reached by folding and so 
is a reflection of life that is created through the act of folding. The RNA world 
hypothesis proposes that self-replicating biopolymers could have played a key role in 
the beginning of evolution [50]. In theory, life may have originated from the folding 
of a lifeless strand. Very recently, the Unrau lab made amazing progress toward 
producing a fabled ‘replicase’, a sequence of RNA that can copy itself [51]. The new 
ribozyme has a sense of ‘self’, using a bistable clamping mechanism that recognizes 
its own promoter sequence, demonstrating again how bistability is a fundamentally 
important property of living polymers. RNAs in nature navigate folding pathways 
to activate different functions in a context-dependent way. Today, we are only just 
beginning to unlock the secrets behind designing and folding RNA. An exciting new 
era for molecular structure that can deploy cotranscriptionally awaits! 
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Ok: A Kinetic Model for Locally A) 
Reconfigurable Molecular Systems gu 


Pierre Marcus, Nicolas Schabanel, and Shinnosuke Seki 


Abstract Oritatami is a formal model of RNA co-transcriptional folding, in which 
an RNA sequence (transcript) folds upon itself while being synthesized (transcribed) 
out of its DNA template. This model is simple enough for further extension and also 
strong enough to study computational aspects of this phenomenon. Some of the 
structural motifs designed for Turing universal computations in oritatami have been 
demonstrated approximately in-vitro recently. This model has yet to take a signifi- 
cant aspect of co-transcriptional folding into full account, that is, reconfiguration of 
molecules. Here we propose a kinetic extension of this model called the oritatami 
kinetic (Ok) model, similar to what kinetic tile assembly model (kTAM) is to abstract 
tile assembly model (aTAM). In this extension, local rerouting of the transcript inside 
a randomly chosen area of parameterized radius competes with the transcription and 
the folding of the nascent beads (beads are abstract monomers which are the tran- 
scription units in oritatami). We compare this extension to a simulation of oritatami in 
the nubot model, another reconfiguration-based molecular folding model. We show 
that this new extension matches better a reconfiguration model and is also faster to 
simulate than passing through a nubot simulation. 


1 Introduction 


Transcription is a phenomenon in which a system encoded on a DNA sequence is 
copied sequentially out of ribonucleic acids by an RNA polymerase into an RNA 
transcript. The particularity of this process is that the transcript folds upon itself into 
an intricate structure while being synthesized, that is co-transcriptionally. 
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Fig. 1 Transcription and RNA origami [6]: An RNA polymerase (colored in orange) scans a 
DNA template (gray) and maps its sequence, nucleotide by nucleotide (A, T, C or G), into an RNA 
sequence (the transcript) according to the loss-less function A > U, C > G, G —> C, and T— A. The 
transcript folds upon itself with high probability into a precise structure that can be programmed 
from the sequence of the DNA template 


This phenomenon, called co-transcriptional folding, has proven to be pro- 
grammable in-vitro. Indeed, in [6] Geary, Rothemund and Andersen demonstrated 
how to encode a rectangular tile-like structure in a transcript (actually, in its corre- 
sponding DNA template) so that following its folding pathway, the transcript folds 
co-transcriptionally into the target structure (Fig. 1). The design of such an RNA 
origami architecture has been highly automated by their software RNA Origami 
Automated Design (ROAD) [2]. ROAD “extends the scale and functional diver- 
sity of RNA scaffolds” so that they might be large and functional enough even to 
accommodate simple enough computation. 

Besides serving as a scaffold for computation, co-transcriptional folding itself is 
capable of computing by encoding several folding pathways into a single transcript 
and letting an appropriate one be “called” depending on the environment [8, 9, 12]. 
The oritatami model was introduced in [3] to explore theoretically the computation 
capabilities allowed by co-transcriptional folding. It was first demonstrated to be 
capable of counting in binary [ 3, 7] and then of simulating arbitrary cyclic tag system 
[5]. As such, the oritatami model is efficiently Turing universal: it can simulate 
arbitrary Turing machines with a quadratic-time slow down only. Precisely, a cyclic 
tag system (CTS) is a binary word (over {0, 1)}) rewriting system that consists of an 
initial tape word w° and a finite cyclic list of productions (binary words) and yields a 
sequence of nonempty tape words w®, w!, w’,...; at step i > 1, it rewrites the tape 
word w'—! into w! by (1) appending the current production at the end of w!~! if and 
only if its leftmost letter wi! is 1 (and appending nothing otherwise), and then (2) 
deleting this first letter, and (3) rotating the cyclic list. 

The oritatami CTS simulator [5] encodes the cyclic list of a given CTS in each 
period of its cyclic (periodic) transcript. The encoded list folds into a compact shape 
(called switchbacks) by default, unless a production must be appended at the end 
of the current tape word, in which case the current production inside that list folds 
in a self-supported expanded shape (called glider), extending the current tape word 
accordingly. 


Multiple configurations in co-transcriptional molecular folding computing. As 
computation is achieved in tile assembly systems by gluing tiles together in differ- 
ent configurations in response to its surrounding, computation is achieved in co- 
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transcriptional folding model by having the transcript be folded in different shapes 
in response to its environment, i.e., its context. Some of the RNA structures created 
by ROAD [2] resemble structural motifs used for computing in oritatami, such as 
glider and switchbacks. That been said, the ability to encode two addressable shapes 
into one single sequence has not been demonstrated experimentally yet. Such com- 
patibility between two foldable shapes, however, might be dispensable. Indeed, the 
Turing universal transcript in oritatami has been simplified significantly in [10]. In 
this later work, the periodic transcript folds co-transcriptionally into a space-time 
diagram of a 1D cellular automaton (CA). Each period of the transcript folds into a 
macrocell, an upscaled version of the corresponding simulated CA cell. The macro- 
cell is divided into three parts: (1) the first detects the absence of a neighboring cell 
and build the missing boundary in that case, (2) the second builds the inner shell 
of the macrocell, and (3) the third part reads the input bits encoded on its NW and 
SW borders, and writes the output bits on its SE and NE borders. Only the first part 
requires a compatibility between two non-trivial patterns (actually, between glider 
and switchback). The inner shell is indeed hardcoded in the second part and uses 
only gliders and 60°- and 120°-turns. The third part consists of, first, reading glid- 
ers that get locally flatten when facing beads of specific types and, second, of a 
flat line encoding a transition table. The reading gliders get flat when passing by 
the parts of a neighboring macrocell border encoding a 1, which shifts forward the 
upcoming transition by an appropriate amount so as to expose only the output entry 
corresponding to the input. An interesting feature of this new I/O interface is its 
tolerance to misalignment of macrocells, a typical desired outcome in presence of 
molecular reconfigurations, even though the macrocells never get misaligned in the 
deterministic oritatami system. 


The need for a co-transcriptional reconfiguration model. As opposed to tile assem- 
bly model, experimental evidence of computing using molecular co-transcriptional 
folding has not yet been proven, even though it has been shown theoretically pos- 
sible thanks to the oritatami model. However, one main obstacle is the lack of a 
model which would take into account the probabilistic nature of experimental set- 
tings. This was solved for the tile assemblies by introducing the kinetic tile assembly 
model (kTAM) which provided useful hints, such as proofreading tiles [13], which 
allowed in turn to conduct successful experimental implementation of computing 
nanostructures [ 1, 14]. Co-transcriptional folding is significantly different from tile 
assembly as it is not composed of independent entities but requires that all the beads 
or monomer composing the resulting structure to be connected by a path. The nubots 
model introduced in [14] includes reconfigurations and allows to build structure 
with this kind of constraint, and even much more complex ones. Even if there is a 
way to simulate oritatami systems with nubots, this simulation is indirect and passes 
through unnatural intermediate states that should not be considered in a kinetic model 
(Sect. 2). This motivates the introduction of a new model, called Oritatami kinetic 
(Ok) model, which extends the oritatami model to include thermal reconfigurations 
of the folding structure. We hope this model to be powerful yet simple enough to 
design co-transcriptional folding scheme that will be robust enough to be imple- 
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Fig. 2 Oritatami model: From left to right, the growth from bead E12 to bead E18 of a self- 
supported glider with delay ô = 3, transcript p = E12 ... E23 andrule {E12 ¥ E17, E14 ® E21, 


E18 # E23, E20 # E15}. At each step, the set of nascent paths and maximizing the number of 


bonds is shown. The nascent beads are highlighted in bold black. The nascent paths are drawn in 
bold black until the last bond made and ends in colors when their tail is free to move (i.e., is not 
bounded by any bond) 


mented in-vitro. Furthermore one might hope that it may also open new way of 
design co-transcriptionally folding structures taking advantage of these new features 
(Fig. 2). 


2 Molecular Reconfiguration: Oritatami and Nubots 


Let us first compare the oritatami and nubots models. 


Oritatami model. An oritatami system consists of a “molecule” (the transcript) 
consisting in a sequence of “beads” (monomers) that attract each other according to 
a given binary relation ¥ called the rule. The molecule grows in the triangular lattice, 
by one bead per step. At each step, the ô most recently produced nascent beads are 
free to move around to look for the position that maximizes the number of bonds 
they can make with each other or with beads placed already (hence the folding is 
co-transcriptional). Once that optimal position is found, the oldest nascent bead will 
adopt this position forever. Then, a new nascent bead is produced (according to the 
transcript sequence) and the process continues by optimizing the position of the new 
ô nascent beads. The transcript, i.e., the sequence of beads, is assumed to be finite 
or periodic. The parameter ô is called the delay. The folding starts from an initial 
configuration called the seed. A time- and space-efficient and easy-to-use oritatami 
model simulator is freely available at [ 11]. 


Nubots model. Nubots is a general purpose model capturing many (if not all) aspects 
of 2D molecular reconfigurability. Nubots are grown in the triangular lattice as well. 
They are composed of monomers that interact with their lattice neighbors in a non- 
deterministic manner. At each time step, a monomer can change its internal state, 
create a new neighbor monomer, or conversely disappear, change the nature of its 
bond with one of its neighbors (none, rigid or flexible), or move around one of its 
bonded neighbors taking along a part of its bond-connected component. Each of these 
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Fig. 3 Nubots model: (Figure extracted from [14]) a A monomer in state a at posi- 
tion in the triangular grid coordinate system (®,¥,B). b Examples of monomer 
interaction rules, written formally as follows: rı = (1, 1, null, z) —> (2,3, null, Z), 
m = (1, 1, null, %) > (1, 1, flexible, Z), r3 = (1, 1, rigid, X) > (1, 1, null, X),  r4= 
(1, 1, rigid, %) > (2,3, flexible, X ), rs = (b, empty, null, %) > (1, 1, flexible, X ), 
ro = (1, a, rigid, %) > (1, empty, null, %), and r7 = (11, rigid, X) > (1,2, rigid, Y). For 
rule r7, the two potential symmetric movements are shown corresponding to two choices for arm 
and base, one of which is non-deterministically chosen 


possibilities is described in a list of possible actions between any pair of neighboring 
monomers according to their respective internal states (Fig. 3). As for the oritatami 
model, the process starts from an initial configuration called the seed. This model 
takes advantage of both local and parallel reconfigurability to build large structure 
in a logarithmic number of parallel updates only. 


Oritatami and nubots. One approach to introduce reconfigurability in the oritatami 
model is to implement it in the nubot model. 


Theorem 1 Any oritatami system can be implemented as a nubot. 


Proof Consider an oritatami system O with seed o, periodic transcript p, rule ¥ 


and delay ô. The simulating nubot will consist of two kind of monomers: placed 
monomers and nascent monomers. Placed monomers correspond to the beads that 
have already been placed at their final location; their internal state will be the bead 
type of the corresponding bead according to the periodic transcript p. The nubots 
will grow and retract a chain of monomers linked by rigid bonds corresponding to 
the folded oritatami molecule. Nascent monomers correspond to the ô nascent beads 
of the simulated oritatami system; their internal state will encode not only the bead 
type of their corresponding bead but also some finite amount of information allow- 
ing to explore one by one all the possible paths, so as to compute the paths that 
maximize the number of bonds that the nascent beads can make in the simulated 
oritatami configuration. Note that the simulating nubot will conduct the path explo- 
ration by a simple depth first search where each nascent monomer spawn its child 
nascent monomer in every possible direction in a recursive manner. To conduct this 
exploration, each nascent monomer only needs to remember the currently explored 
direction, the current best number of bonds made by its descendants in some already 
explored directions, and its index in the transcript (to specify its bead type). As 
the total number of feasible bonds may not exceed 46 + 1 (4 for the intermediate 
nascent monomer and 5 for the tailing one), the required number of internal states 
is O(log ô + log |p|), that is constant. Each parent nubot keeps the best number of 
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bonds among all of its children and adds its own number of bonds with the current 
environment, and then sends it to its parent before disappearing, until the process 
reaches back the path root nubot. This root will then be able to extend the oritatami 
path in the direction of the best move and will then restart the exploration from there. 


This simulation of an oritatami system by a nubot is however unsatisfying as 
it requires the regular disappearance of monomers and thus introduces unnatural, 
and thus undesirable, intermediate states in the simulation. Other nubots simulation 
schemes exist at scale | that do not require the disappearance rule, but require instead 
to disconnect the nascent beads to rotate them around the environment, or to make 
room by pushing the environment around, but both of these intermediate states are 
equally undesirable. 


Claim Nubots cannot mimic the nascent beads moves at scale 1 without the disap- 
pearance rule nor disconnecting the nascent beads. 


Indeed, consider a seed configuration consisting of an Y-shaped tunnel with two 
arms of length ô where the oritatami system starts at the intersection. There are 
exactly two bead types, A and B, and the only attraction rule is A @ B. Its transcript 


is Aĉ (A repeated 5 times). The wall of the tunnels are made of As but the bead 
placed at the end of the tunnel can either be A or B. In order to simulate faithfully 
the oritatami system, the nubot must reach both ends of the tunnel (otherwise one 
could exchange the two beads at the end of tunnel and invalidate the simulation). 
As no part of the grown nascent arm can be moved in any direction without being 
disconnected there are no other possibility than erasing the nubots grown when the 
wrong tunnel is explored first, which can be guaranteed by placing B in the second 
tunnel to be explored by the nubots. 


3 The Ok model 


The previous section showed that nubots are not well suited to model a dedicated 
kinetic co-transcriptional folding model because it would require passing through 
many unrealistic and time-costly intermediate states. Furthermore, in order to get 
closer to nature, we want, as in kTAM, our kinetic oritatami model to randomly 
reconfigure parts that have been folded already. As the connectivity of the path (made 
of covalent bonds) must be maintained, this would require lots of computation steps 
as well as lots of memory if modeled inside the nubots world. 
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3.1 Reconfiguration Events 


In the oritatami kinetic (Ok) model, the dynamics changes. Instead of optimizing the 
nascent beads positions after each extension of the transcript by one bead, there are 
three kinds of reconfiguration events that are competing with each other: 


Growth: A new bead is added at the end of the growing molecule. Its bead type is 
given by the transcript. Itis placed at a uniformly and randomly chosen unoccupied 
location next to the current end of the path. If none of these locations is unoccupied, 
the growth fails and will be retried the next time the growth event is triggered. 

Nascent beads reconfiguration: The path followed by the 6 nascent beads (the 
ô most recently produced) is rerouted non-deterministically without overlaps, 
according to some probabilistic distribution to be discussed next. 

Internal reconfiguration: An hexagon H of radius p was picked at random and 
all the subpaths inside this hexagon H are rerouted non-deterministically while 
keeping their extremities on the borders of H unchanged (see Fig. 4). 


Figure 4 gives an example of a reconfiguration. Note that none of the reconfigura- 
tion events involves directly any bonding scheme optimisation process, as opposed 
to regular oritatami model. As for kTAM, this optimization will be induced by the 
rates at which these various reconfigurations are applied, as will be detailed in the 
following section. 


3.2 Reconfiguration Distributions and Events Rates 


Each of the reconfiguration events Growth, Nascent beads, and Internal reconfig- 
urations will be triggered according to exponential random variable of respective 
rates rg, ry and rz. The growth rate rg only depends on the transcription speed of 
the polymerase (which may depends in turn on concentrations, temperature,...). The 
nascent beads and internal reconfiguration rates, ry and rz, do however depend on 
the local configuration where they are applied: the more bonds are made, the more 
stable the local configuration is and the less likely reconfiguration is made. 


Local reconfiguration random distribution. In the Ok model, we assume that when 
a reconfiguration (nascent or internal) is applied, the new local configuration (of the 
nascent path or of the subpaths in the hexagon H) is drawn uniformly at random 
among all the valid local reconfigurations. Together with the upcoming definition of 
the rates, this ensures that the resulting distribution follows Boltzmann law. 


Reconfiguration events rates. Following the steps of [13], we define the rate of 
each reconfiguration as a function of the number of bonds involved: a configuration 
involving b bonds will dissociate at a rate inversely proportional to some exponential 
of b. Precisely, as in [ 13] we define the free energy of dissociation of a single bond 
to be AG/RT = Gp, and thus Gg contains a mix of entropic and enthalpic factors 
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related to the formation of the bond between the two beads, measured in units of 
RT. The rate of the next internal reconfiguration event in a given hexagon H is then 
defined as: 


rı = ky exp(—bGs), d) 


where b is the number of bonds involving some bead strictly inside the selected 
hexagon H. kf is a cinetic parameter, usually set to kp = 1076. 

To take into account the specificity of co-transcriptional folding where the nascent 
beads, close to the polymerase, fold at a much higher rate, we define similarly the free 
energy of dissociation of a single nascent bond (i.e., involving at least one nascent 
bead, and thus catalyzed by the proximity to the polymerase) to be AG/RT = Gy. 
The rate of the next nascent reconfiguration event is then defined as: 


ry = kr exp(—bGy), (2) 


where b is the number of nascent bonds, involving at least nascent bead. 
In order to express the rate rg of the growth of the transcript in the same terms, 
we introduce, as in [ 13], a fictitious energy Gg such that: 


rg = kp exp(—Go). (3) 


Adjusting the rates. In order to match the oritatami dynamics, the nascent beads must 
explore up to 5° paths, so as to find the optimal nascent path, before the next nascent 
bead is produced. This exploration requires on average ~ 5 In 5 - 5° uniform random 
nascent reconfiguration events, as collecting N coupons takes on average N In N 
trials, and thus ~ 5In5-5°/ry time as each event occurs every 1/ry on average. 
In order to match the co-transcriptional folding experimental observation that the 
growth occurs at a much lower rate than the optimal folding of the nascent beads, 
we must then have: 


65°In5 1 ; r : 
K — thatis: ry > 65° -rg,ie.. bGy S Gg —61n5 (4) 
FN 'G 


Note that this is consistent with the fact that the number of nascent bonds b is at 
most 46 + 1: each nascent bond must account for a fixed amount of energy, that is: 
Go = 6(4Gy + ln 5). In particular we should have: Gg 2 61n5/4 so that Gy > 0. 
This first rough estimation however needs to be confirmed by running effective 
simulation of the Ok model. 

There are no explicit constraints between G g and Gy besides the fact that Gg > 
Gy as the nascent bounds should dissociate more easily than the (colder) bonds 
located away from the polymerase. 
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3.3 Implementing the Ok model 


We do not have a running implementation of the Ok model yet. Here are a list of 
recommendations for its software implementation. 


Ok model possible implementation. The Ok model is parametrized by (ô, p, Gg, Gz, 
Gy). The possible event are: 


e growth: extending the molecule by one bead whose position is chosen uniformly 
at random among the unoccupied positions neighboring the current end of the 
molecule; 

e internal(x): rerouting uniformly at random the subpaths strictly inside the hexagon 
H (x, p) centered on position x with radius p, without modifying their extremities 
on the boundary of H (x, p); 

e nascent: rerouting uniformly at random the path consisting of the ô nascent beads 
at the end of the molecule. 


There are as many internal(x) events as there are hexagons H (x, p) intersecting 
the molecule. Each event is associated with an occurrence time T picked at random 
according to their corresponding rate rg, rz or ry. Note that the rates rz and ry depend 
on the current local configuration. As T is amemory-less exponential random variable 
of law Exp(r), s.t. Pr{T > t} = e™™, it can easily be picked using the formula: 
T = — (ln U)/r where U is a uniform real-valued random variable over [0, 1]. The 
events scheduling is classically implemented using a priority queue to extract the 
next upcoming event (with the lowest occurrence time). Thanks to the memory-less 
property, the occurrence times of all events impacted by an applied event are simply 
redrawn according to their recomputing rate: for instance, the occurrence times of 
the internal(y) events need to be updated for all y € H(x, p) after an internal(x) 
event occurred. 


Performance optimization. The implementation of the nascent and internal will get 
computer intensive as soon as 6 and p get larger than 10 and 5 respectively. Even for 
smaller values of p, we recommend to precompute and remember the set of possible 
subpaths for a given local configuration so as to speed up the uniform picking of the 
reconfiguration. As subpaths are clamped at both ends, this should not impact too 
much the memory. Note however that if a free path (whose only one end is clamped) 
belongs to the hexagon, it might however require too much computation time as 
there might be too many possible configurations to consider (as it could be as long as 
©(7) if it fills a constant fraction of the hexagon). In this situation, we recommend 
to allow only the ô last beads of the free path to be rerouted. 
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4 Conclusion 


In this article, we have proposed a new model which includes randomly occurring 
local reconfigurations in the oritatami model. As demonstrated by the experimental 
implementation of the tile assembly model [ 12, 14], taking into account this natural 
phenomenon into model is a necessity to design successful in-vitro implementations. 
We hope to implement this model as an open source software soon, and are eager to 
explore whether the basic structures that made computation possible in the oritatami 
model, such as glider, switchback, folding meter and pocket, can be made tolerant to 
such thermal noise. One may also wonder if such reconfiguration could be exploited 
to discover new way of computing using co-transcriptional molecular folding. 
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Abstract A diverse array of theoretical models of DNA-based self-assembling sys- 
tems have been proposed and studied. Beyond providing simplified abstractions in 
which to develop designs for molecular implementation, these models provide plat- 
forms to explore powers and limitations of self-assembling systems “in the limit” 
and to compare the relative strengths and weaknesses of systems and components 
of varying capabilities and constraints. As these models often intentionally overlook 
many types of errors encountered in physical implementations, the constructions can 
provide a road map for the possibilities of systems in which errors are controlled with 
ever greater precision. In this article, we discuss several such models, current work 
toward physical implementations, and potential future work that could help lead 
engineered systems further down the road to the full potential of self-assembling 
systems based on DNA nanotechnology. 


1 Introduction 


Beginning as a branch of mathematics, computer science is fundamentally focused 
on understanding the process of “computing” and how it can be embodied in phys- 
ical systems. Mainstream studies largely focus on digital computers, composed of 
electronic circuits operating on Boolean logic. However, the underlying concepts 
of processing and transforming information via specified rules, or algorithms, can 
also be realized in many other formats (e.g., analog computing devices [1], quantum 
computers [2], etc.), including natural systems that provide continued inspiration 
for novel directions in engineering (such as information processing networks in 
cells [3]). 

At the foundation of computational theory is the existence of “universal comput- 
ers,’ which are computing devices capable of running any possible program. These 
allow for the design of systems which can be given input in the form of an arbitrary 
program along with arbitrary data to be given to that program and which output the 
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result of running the input program on the input data. This is in contrast to having to 
design a specific computer for each program that one wants to run and provides for 
an immense diversity of programmable behavior for such systems. 

Relatively quickly after the structure of DNA was revealed [4] and its basic prop- 
erties began to be understood, computer scientists started to see its potential as a 
programmable substrate [5—7] which could yield additional directions in computing. 
Harnessing the combinatorial powers of DNA molecules opened the door for engi- 
neered computing at the nanoscale. While biology utilizes DNA as an information 
storage medium, scientists and engineers began to see its potential as a structural 
component (e.g., [8—11]) and a platform for computing (e.g., [12—14]). This has 
enabled the design of systems whose target behaviors and outputs include structure 
building and computing, and through a combination of those, interfacing with chem- 
ical and biological systems to process information about them and/or to generate 
output that becomes integrated within them [15-17]. 

Self-assembling systems are systems composed of relatively simple components 
which begin in a disorganized state and autonomously combine to form more complex 
structures. There are already many different approaches to realizing DNA-based self- 
assembling systems as platforms for structure building guided by computation, and 
one job of theoreticians has been to abstractly model them and to determine, in their 
mathematical limits, the strengths and weaknesses of each, especially as compared 
to each other. This includes an emphasis on those which may have more viable 
molecular implementations, and an additional aim has been to search for techniques 
which can be used to circumvent or at least minimize errors that occur in physical 
implementations (i.e., behaviors that deviate from those predicted by the high-level 
mathematical abstractions). In this paper, we seek to outline several of the key aspects 
of various abstract models of DNA-based self-assembling systems that have been 
studied and discuss the theoretical pros and cons they provide as well as the current, 
and potential future, development of DNA-based systems capable of implementing 
them. The continued growth of both the theoretical and experimental sides of DNA 
nanotechnology, with each relying upon the other for insights and direction, can lead 
to the implementation of more complex and diverse systems that more fully realize 
the theoretical potential of self-assembling systems. 

To cover the wide diversity of models and techniques that have been explored, 
this paper is organized as follows. In Sect. 2, we cover some preliminary definitions, 
and in Sect. 3, we define some of the main metrics often used for comparison across 
models and systems. In Sect. 4, we discuss one spectrum across which systems may 
vary, that of how many times copies of each individual component type appear in 
a self-assembled structure, and demonstrate ways in which trade-offs in different 
metrics occur across that spectrum. In Sect. 5, we discuss a wide variety of methods 
that can be used to provide the input to a self-assembling system and direct its output 
and demonstrate ways in which those methods have been implemented and their 
corresponding trade-offs as well as current technical limitations. In Sect. 6, we cover 
a set of varying dynamical behaviors that systems and/or individual components of 
systems may have, and how those behaviors can influence the powers and limitations 
of the systems, as well as current experimental implementations and potential future 
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implementations that may more fully realize the powers provided those dynamical 
behaviors. Finally, in Sect.7 we provide a brief summary and our optimism for the 
future of this area. 


2 Definitions and Notation 


In this section, we introduce a few basic definitions and some of the notation used 
throughout the following sections. 

An individual building block of a self-assembling system will be referred to as 
a tile, or sometimes as a monomer. Although a tile actually represents a component 
made out of one or more strands of DNA, for most of the theoretical models we 
will be discussing they will be abstractly represented as two-dimensional squares (or 
three-dimensional cubes in a few cases). An example schematic depiction of a tile 
can be seen in Fig. 1. 

Tiles are able to attach to each other via glues on their sides. A glue is an abstraction 
of a binding domain, which is typically a single-stranded portion of DNA that is free 
to bind to a strand containing the complementary sequence. Although the overall 
strength of binding of two complementary strands of DNA is highly dependent upon 
several factors (e.g., the specific sequences, their lengths, and the number of Gs 
and Cs as opposed to As and Ts), a common design goal is to create categories of 
binding domains which have very similar attachment strengths to all other binding 
domains within the same category. For this reason, it is possible in the abstract, 
mathematical models to instead refer to glues by their categories. Therefore, we will 
equate each strength category with a natural number and call a glue a “strength 1,” 
or “strength 2” glue, for instance. Strength 1 glues can be thought of as those whose 
binding strengths (with their complementary domains) are approximately equivalent 
to some system-specific basic, standard value. Strength 2 glues can be thought of 
as having approximately double that binding strength. As an additional abstraction, 
rather than referring to specific DNA sequences for glues, we will give each glue a 
text label. (e.g., in Fig. 1, the north glue has label a and strength 1.) A glue binds to 
its complement, and the complement of a glue label is represented with the prime 
character, e.g., the complement to a is a’. 


Fig. 1 Schematic depiction of an example (square) tile. Each glue (a.k.a. binding domain) is 
represented by the text label and black rectangle(s) on one side. The number of black rectangles 
corresponds to the integer-valued strength of the glue. The north (1.e., top) side has a strength 1 glue 
of type a, the east (i.e., right) side has a strength 2 glue of type b, the south (i.e., bottom) side has 
no glue (or the null glue), and the west (i.e., left) side has a strength 2 glue of type c’. The entire 
tile has the label D 
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An entire tile may also be assigned a label (e.g., D for the tile in Fig. 1). Such a label 
is non-functional as far as the tile’s binding to other tiles, but can be used to categorize 
the tile and potentially represent some sort of marking on, or functionalization of, 
the tile. For example, tiles labeled with a 1 could have an attached molecule that 
makes them distinguishable during imaging from tiles labeled with 0 that have no 
such attached molecule. In this way, a readable pattern may be formed in an assembly 
by tiles labeled 1 and 0. 

Since we are focusing on systems which build structures, we will call the desired 
output of a system the target structure (or target assembly). Typically, there is a 
target shape that defines the two- or three-dimensional shape of the target structure. 


3 Metrics 


If models, and systems within them, are to be measured against each other, there 
must be metrics on which to base the comparisons. There are several such metrics by 
which self-assembling systems can be measured. While accuracy in producing the 
desired output and robustness to environmental conditions are of great importance, 
several other metrics may influence important aspects, such as the feasibility of 
implementation. 

We now define four metrics that we will focus on during our comparisons of 
models: 


1. Tile complexity: the number of unique tile types required, i.e., the number of 
unique kinds of monomers that serve as building blocks in a particular system. 
This is a measure of the amount of monomer reuse that is achieved and is discussed 
at greater length in Sect. 4. 

2. Monomer complexity: in general, the (maximum) size of individual monomer 
types. This is based upon the length and number of strands composing a single 
monomer type. Complexity may also more generally be derived from require- 
ments for complex shapes, rigidity, or dynamic behavior, as well as the difficulty of 
fabrication (i.e., the number of experimental steps required for their production). 

3. Resolution: the physical dimensions of a single coordinate location in the target 
shape (i.e., given the set of coordinates for the target shape’s voxels, how large is 
the volume of each in the actual target structure). It is often the case that, some- 
what counterintuitively, constructions in theoretical models can be shown to be 
more efficient (sometimes even achieving mathematically provably the greatest 
possible efficiency) in tile complexity by generating “scaled up” versions of 2D 
(resp. 3D) target shapes in which each pixel (resp. voxel) is replaced by a square 
(resp. cube) of potentially many tiles. (See Fig. 2 for an example.) 

4. Addressability: the ability to uniquely address locations in the target structure 
with specific binding domains. For many applications, greater specificity in the 
ability to address unique locations of the target structure is desired, since such 
locations can be used to precisely place molecules linked to the DNA strands. 
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Fig. 2. (Left) Example shape 
S, (middle) S pixelated and 
at scale factor 1, (right) S at 
scale factor 2 


4 Monomer Reuse: Hard-Coded Versus Algorithmic 


A major design consideration for self-assembling systems deals with monomer 
reuse. For instance, in a system of self-assembling tiles, given the set of tile types, 
how many times does a tile of each type appear in the target structure? In this 
section, we investigate theoretical models and experimental implementations that 
vary greatly in monomer reuse and discuss related trade-offs and potential future 
directions. 

On one end of the spectrum, each type appears only one time in the target struc- 
ture. We refer to such structures as hard-coded, a.k.a. uniquely addressed, and note 
that this paradigm has been successfully experimentally employed in both DNA 
origami [9] and DNA bricks [8]. (See Fig. 3a for an abstract example.) The advan- 
tages of such systems include excellent addressability as well as robustness to some of 
the types of errors seen with monomer reuse (see Sect. 6.1). Disadvantages include 
maximal tile complexity and that the size of a target assembly is limited by the 
number and size of unique monomer types which can be created and utilized. For 
instance, in DNA origami one long DNA strand, referred to as the scaffold strand, 
winds through the entire structure and is bent into the target shape by many short 
staple strands that each bind to approximately two or three short sections of the 
scaffold strand, bringing and holding these distant parts closer together to eventually 
form the final shape. With the size of a DNA origami structure determined by the 
length of its scaffold strand, the design of custom scaffold strands (as opposed to the 
original standard scaffold strand, the M13mp18 bacteriophage’s genome) becomes 
important. Much progress is being made in this area [18-20], allowing for both 
smaller scaffolds, with only a few hundred nucleotides, and much larger scaffolds, 
with over 50,000. Continued improvements to allow greater diversity in scaffold 
sequences and lengths will further increase the range of structures producible via 
DNA origami. 

One alternative to hard-coded structures is the generation of (theoretically) 
unbounded periodic structures. In this case, each monomer type is reused arbitrarily 
often in a repeating pattern (see [21, 22] for experimental examples and Fig. 4 for 
abstract examples). Advantages of such systems include theoretically unbounded 
sizes for target structures, potentially low tile complexity, and robustness to nucle- 
ation errors (since it is valid for growth to begin from any location, see Sect. 6.1) and 
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(a) Example with hard-coded (b) Example with tile reuse us- (c) Zoomed out example of a 
(a.k.a. uniquely addressed) tiles. ing a trivial “filling” algorithm. 100 x 100 square using nearly 
optimal tile reuse. 


Fig. 3 Depiction of hard-coded assembly versus algorithmic assembly 


(b) (c) 


Fig. 4 Examples of periodic structures, including a a grid of square tiles and b, ¢ lattices formed 
by abstract non-square monomers 


growth errors (see Sect. 6.1). However, disadvantages include limited addressability 
(restricted to periodically repeating locations) and growth of uncontrolled numbers 
of copies relatively simple structures. 

An additional alternative is algorithmic growth in which individual monomer 
types may be used arbitrarily often (even in a possibly aperiodic manner) in each 
target assembly, as the attachment of each implicitly follows the rules of a designated 
algorithm. (See Fig. 3b for a basic example and Fig. 5 for a more complex example.) 
Advantages include the ability to (theoretically) create assemblies of arbitrary but 
bounded size using mathematically optimal tile complexity. Using information the- 
oretic and computational complexity arguments, it has been proven that algorithmic 
self-assembly systems in the aTAM are capable of universal computation [6], can 
achieve mathematically optimal tile complexities for systems constructing squares 
[23] and scaled versions of arbitrary finite shapes [24], and even include systems 
capable of universally simulating all other systems [25]. For instance, Fig. 3c shows 
a zoomed out image of a 100 x 100 square which self-assembled using an efficient 
algorithm for counting and filling. The green row is made up of 7 unique tile types, 
the yellow portion of only another 14, and the entire gray portion of only 6 unique 
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tile types. The algorithm used to create the system can take an arbitrary positive 
integer n as input and output a system that self-assembles an n x n square using 
log(n) + 14 + 6 tile types, where the same 14 yellow and 6 gray tiles are used for all 
values of n, and an n-specific set of log(n) green tiles are used to encode a number 
related to n. The green tiles assemble into a row to which the 14 yellow tiles attach 
and execute the binary counting algorithm to grow to the necessary height, allowing 
the 6 gray “filler” tiles to fill in the rest of the square to precisely the dimensions 
n x n. While this construction uses approximately log(7) tile types for arbitrarily 


large n, in [26] it was shown that this can be improved to © (cee): which was 


also shown to be the mathematical lower bound [23] for almost all values of n. Note 
that this is an exponential improvement over a hard-coded assembly, which even in 
the case of the 100 x 100 square, for example, would require 10, 000 unique tile 
types, rather than the 27 used here. 

Disadvantages of algorithmic self-assembly include restricted addressability and, 
quite importantly, the potential for various types of errors to occur. It has been shown 
that for algorithmic growth to occur, some form of cooperation (see Sect. 6.1) is 
required, and this can lead to errors like those discussed in Sect. 6.1. Furthermore, 
several theoretical results require large scaling of shapes (see Sect. 3 for the definition, 
and an example, of a scaled shape). In some constructions, the scale factor is quite 
large, so each point of the shape requires a relatively large volume and is filled by a 
large number of tiles, with that number often depending upon, and growing with the 
size of, the specific target shape. This can also require tile sets that are significantly 
larger than can be currently successfully implemented. 

Developing large tile sets which self-assemble with few errors requires the design 
of large sets of orthogonal glue domains (i.e., sets of domains such that each has a 
strong affinity for its complementary domain, but very weak affinity with all others). 
The number of possible domains of any given length n (i.e., composed of n bases) 
is bounded by 4” (since there are 4 bases to choose from in each location), but only 
a subset of those can be selected so that (1) they are orthogonal, and (2) the binding 
affinities of all complementary pairs are very nearly equivalent. The second condi- 
tion is important so that all glues have similar behaviors and becomes even more 
important if glues in different theoretical strength categories are to be implemented 
(see Sect. 6.1). Additionally, as glue domains are forced to become longer to accom- 
modate larger sets of glues, the potential for non-orthogonal interactions increases 
(i.e., glue domains may have positive binding affinities for domains other than their 
complements). Other commonly considered criteria include ensuring that sequences 
have minimal “self-structure” (i.e., they do not have a tendency for some subse- 
quences to bind to other subsequences on the same strand), avoiding G-tetraplexes 
(i.e., sequences of 4 Gs in a row), etc. Work has been done to create models that 
predict domain interactions and software that can perform automated design and 
theoretical testing of glue sets based on mathematical models of DNA strand inter- 
actions [27-32]. However, additional enhancements and extensions to this so-called 
sequence design process for glue domains has the potential to greatly improve the 
size and quality of glue domain sets and therefore the sizes of tile sets which can be 
successfully implemented. 
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5 Inputs 


In this section, we discuss various methods of providing input to self-assembling 
systems and theoretical models and experimental implementations that utilize them. 
The goal is to understand different ways of controlling self-assembling systems and 
how we can improve that control in the future. 

We can think of the monomers of the system as representing the instructions of 
the program to be executed. On the one hand, we can design deterministic systems 
which are non-programmable, in which we mean that (within a valid range of envi- 
ronmental parameters) the system is designed to always produce the same output, 
i.e., structure, regardless of any (reasonable) variations in the environment. That is, 
the environment does not provide meaningful input to the system and the same “pro- 
gram” is always “executed”. On the other hand, systems may be designed so that 
there is some environmental variable which can be tuned, with each setting yielding 
a distinct output. There are many techniques for providing input to self-assembling 
systems, and we now describe a few of them. 


5.1 Seed Assemblies 


A frequently utilized input technique is the use of a seed assembly. A seed assembly is 
a variable, preformed structure that is added to the system in addition to a constant set 
of monomer types. In theoretical models such as the abstract Tile Assembly Model 
(aTAM) [6], there are computationally universal systems each consisting of a single, 
constant set of “universal” tile types such that for every possible program and input 
data pair, a seed assembly encoding that program and input data can be added to a 
solution containing those universal tiles, causing the system to build a representation 
of the computation of that program on that input data. Furthermore, that computation 
can also determine the resulting shape of the self-assembled structure. In such a 
scenario, the seed assembly is incorporated into the target assembly and provides 
the input that determines its resultant shape and/or pattern. As an example, see Fig. 5 
where an aTAM tile set is shown, as well as assemblies that self-assemble from two 
different seeds. In this aTAM example, the binding threshold (a.k.a. temperature) 
parameter is set to 2, meaning that tiles only attach to the seed, or the assembly 
containing the seed, if they can bind with at least one strength 2 glue, or two strength 
1 glues. (See Sect.6.1 for additional details.) Due to the binding threshold and the 
glue patterns, the assemblies that grow from each seed vary in size. 

Current methods of experimentally implementing seed assemblies include using 
DNA origami as seeds with the tiles being either single-stranded or multi-stranded 
complexes that attach to, and grow away from, the origami seed [10, 28], or even 
just a single strand which serves as the seed for nucleation of growth [33]. 

Variants on the use of a seed assembly are also possible. For instance, a seed 
assembly could instead serve as a template to be filled in by the tiles (which later 
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Fig.5 Example aTAM systems that use the same tile set but different seeds to provide inputs which 
yield differing assemblies 


detach from it), or as a template of a shape to be replicated. However, using seed 
assemblies in such ways requires additional dynamics by which the target structures 
can be separated from the seed assemblies (see Sect. 6). 


5.2 Tile Subsets 


The variable input to a system could instead be the selection of only a specific subset 
of tile types from a larger set. In this case, a large set of tile types is designed and 
synthesized, and then for each particular target assembly, a specific subset is selected 
and added to the solution. This technique can be utilized to make target structures 
which are subsets of a larger structure, as in the DNA brick technique [8], or to 
select specific sequences of program instructions to be carried out from a generic set 
[28]. A disadvantage of this approach is the potential for large tile complexity, and 
an advantage is that simple selection and mixing of the necessary subset of types is 
sufficient to produce any of the potential systems. 
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5.3. Monomer Concentrations 


Rather than strictly varying the monomers in a system in a binary way (i.e., they 
are either present or not), instead their relative concentrations can be varied. This 
technique, known as concentration programming, has been shown to allow for great 
tuning in a set of theoretical results [34, 35], so that by only varying tile concentra- 
tions, any finite shape can be targeted while using a constant tile set. However, not 
only does this require scaling of the target shape and thus a loss in resolution, a source 
of great difficulty with experimental implementations of this approach is the precision 
required for very fine-tuned relative concentrations of tiles. Modern equipment capa- 
ble of mixing nanoliters of fluid is allowing finer control of concentrations, and it may 
be feasible in the future to realize more of the potential of concentration programming. 


5.4 Programmed Temperature Fluctuations 


In systems of tile-based self-assembly where each tile is composed of multiple strands 
of DNA, common laboratory protocol involves a single-pot system with annealing, 
where the individual strands are put into a solution that is first brought to a high 
enough temperature to ensure complete dissociation of strands. Then, the solution is 
cooled to a temperature where the bonds between the individual strands comprising 
each tile are strong enough to allow for the formation of the tile complexes, but the 
binding domains that would bind tiles to each other are not strong enough to form 
long-lasting bonds. After holding at that temperature for a period of time that ensures 
most strands will be incorporated in tiles, the temperature is further lowered to a point 
at which tiles can bind to each other. 

Another theoretically powerful method of supplying input to a self-assembling 
system is to vary the temperature of the system through a series of prescribed tem- 
peratures, both raising and lowering it multiple times. This can allow periods of 
growth followed by periods of melting, which may remove some tiles and create 
favorable locations for different types of tiles in the same locations during periods 
of lowered temperatures. Theoretical modeling of this procedure, known as temper- 
ature programming [36], has been shown to allow for a constant tile set that can 
be programmed, via only temperature change sequences, to form any finite shape 
[37]. Trade-offs can be made between the number of temperature changes required to 
direct growth of a target shape versus the resolution of the shape. However, physical 
implementations have not yet been achieved due to the difficulty of designing bind- 
ing domains and systems with enough granularity to correctly bind and/or dissociate 
across a wide enough set of temperature levels. 

Future advancements in sequence design that allow for the development of glue 
domains that exhibit fine enough granularity in binding strengths to support multiple 
discrete levels of binding and melting (possibly combined with novel tile designs) 
may allow temperature programming to become a useful tool. 
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5.5 Staged Assembly 


The number of experimental steps required to implement a self-assembling system 
can vary greatly. Single-pot single-stage systems exist where a set of strands is added 
to solution, all in one test tube, which is first heated and then annealed, and the target 
structures completely form during that process. Alternatively, systems exist in which 
multiple products are made in different tubes, then the products of subsets of those 
tubes are combined together (with each mixing process and then the simultaneous 
self-assembly processes in the separate tubes considered a “‘stage’’), and the number 
of stages can be quite large. (A simple example is shown in Fig. 6.) The simplicity of 
single-pot single-stage systems is beneficial from an experimental perspective, but 
in the theoretical Staged Tile Assembly Model [38, 39] it has been shown that tile 
complexity can be exchanged for stage complexity. That is, the number of tile types 
required to build shapes can be dramatically reduced (even down to a constant tile 
set for an infinite set of shapes) by increasing the number of stages. Experimental 
work which combines staged assembly and hierarchical growth (see Sect.6.2) has 
also demonstrated the great power of this paradigm [40]. 


Fig. 6 Example staged ghia, Ais. ME 
ie FAR. FAR. Paige: 
assembly system. Individual SRY 233 Si y ii Be iy 
tiles are added to the top su ts Ea a 
tubes in the first stage, then | | | 


assemblies from the top 
tubes are mixed for the next 
stage, etc. 
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Combined with methods that cause tile detachments as well (see Sect. 6.4), theo- 
retical results in staged assembly have shown the possibility for new system behaviors 
such as the replication of input shapes or patterns [41—43] or the marking of input 
assemblies that match specific shapes [44]. 

The theoretical benefits of staged assembly come with a high cost on the experi- 
mental side. First, there is the additional work of carefully, uniquely mixing the inputs 
to each tube of each stage. Although this can be made much easier by automation, it 
then becomes difficult to first know how long the self-assembly at each stage should 
be allowed to proceed and second to extract only the correctly completed products 
of each tube of each stage for further mixing. Theoretical work has been done to 
study staged assembly systems in which the correctly completed products of each 
tube have maximal size difference from all others [45], but this technique (and those 
which effectively realize the tile complexity benefits of staged assembly) requires use 
of hierarchical assembly (see Sect. 6.2). Future work that realizes the full theoretical 
benefits of staged assembly would require improvements in purification techniques 
to make it easier to select only correctly completed products from each stage, as well 
as improved design and control of hierarchically self-assembling systems. 


6 Dynamics 


In this section, we investigate the consequences of a variety of changes to the dynam- 
ical behaviors allowed by different models and/or tile designs. By categorizing a 
variety of dynamical behaviors and demonstrating their powers in theoretical mod- 
els, and also discussing experimental work that has begun to implement several, we 
hope to provide a road map for future work that can further realize the powers offered 
by these behaviors while effectively balancing the trade-offs. 


6.1 Cooperativity 


It was long speculated [23, 46] and recently proven [47] that algorithmic self- 
assembly cannot occur in the aTAM without behavior known as cooperation. Typi- 
cally, we say a tile attaches to an assembly using cooperation when its initial binding 
to that assembly requires it to form bonds with more than one tile that is already part 
of that assembly. Note that biochemistry literature sometimes uses the term avidity 
to refer to the same concept. We can consider the binding domains which initially 
bind when a tile attaches as its “input” domains and the remaining domains (which 
may later serve to allow for the binding of additional tiles) as “output” domains. 
Intuitively, cooperation forces the attaching tile to “read” information from two sep- 
arate tiles via their output domains, and careful design of the tile types of a system 
can ensure that the information encoded in the output domains of tiles represents 
specific logical transformations of the input information (e.g., the output domains 
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Fig. 7 Example of cooperative tile attachment: On the left is an assembly consisting of three tiles 
that make a “corner” which allows for cooperative attachment of a tile that matches the two glues 
labeled “1” and “0”. On the right, four tiles implementing AND logic (with input glues on the left 
and bottom and outputs on the top and right). Only the tile labeled “10” can cooperatively attach to 
the assembly 


could encode the bit resulting from the logical AND operation performed on the bits 
represented by the input domains, as shown in Fig. 7). It has been shown that for an 
arbitrary program, it is possible to design a set of tile types such that they are forced, 
in the theoretical setting, to self-assemble in a pattern that follows the execution of 
that program [6]. 

In the aTAM [6], the requirement for cooperative binding is captured by a system 
parameter called the temperature, which is physically based upon factors such as 
the temperature of the system as well as the concentration of tile monomers. This 
temperature parameter is also commonly referred to as the binding threshold, and 
in the discrete formalization of the aTAM, it is commonly set as either 1 or 2. A 
value of 1 means that the binding of a single input domain is always sufficient to 
allow a tile to “permanently” attach to a growing assembly. A value of 2 means that 
either at least two input domains must correctly bind, or a single input domain of 
at least double strength (i.e., a strength 2 glue) must bind for a tile to attach. The 
theoretical power of aTAM systems with the temperature parameter equal to 2 has 
been proven to be quite impressive, including algorithmic self-assembly capable of 
the natural simulation of any possible program, the self-assembly of structures using 
mathematically minimal tile complexity, etc. 

Unfortunately, the physical reality (as it often does) differs from the theoretical 
model and sometimes experimental systems designed to behave as temperature 2 
aTAM systems do not behave as such. In some cases, tile attachments occur in which 
tiles “temporarily” bind via a single strength 1 glue and the glue in the location 
intended to be the second input has an incorrect, “mismatching” glue domain which 
does not bind (or does so only partially, with low strength). (See Fig. 8 for an exam- 
ple.) Although such attachments are not expected to last long, with some nonzero 
probability a neighboring location may receive a tile which binds to the “erroneous” 
tile with one of its input domains, causing both tiles to be attached to the assembly 
and each other with enough binding strength to be “permanent”. We will call this 
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Fig. 8 Example of a tile binding with a glue mismatch, leading to an algorithmic error. On the left, 
a set of “computing” tiles. On the right, an assembly to which the green tile (but no other) can bind 
with strength 2 (which happens on the bottom path). On the top path, the yellow tile binds with 
strength | and a mismatched glue. Before it detaches (due to weak binding), another tile may bind 
and “lock in” the algorithmic error 


a growth error, and in such a case, the erroneous tile may corrupt the computation 
being performed and cause the algorithmic growth to proceed incorrectly. This type 
of behavior is captured in the more physically realistic kinetic Tile Assembly Model 
[6], a.k.a. KTAM, and kTAM modeling has helped lead to several proofreading and 
error suppression techniques (where errors are considered to be tile attachments that 
differ from the expected tile attachments in the aTAM) that have been developed to 
reduce the prevalence of such errors [48-52]. It is notable that aTAM behavior can 
be approximated arbitrarily closely by the kKTAM, and careful control of temperature 
and tile concentrations along with proofreading can help to the extent that the inci- 
dence of such errors in experimental systems has been decreased to around 0.017% 
[53] (or to 0.03% for larger tile sets [28]). However, even those seemingly excellent 
rates are still be too high for accurate algorithmic growth of even moderately complex 
(from a theoretical perspective) systems. 

Cooperation has been shown to be necessary for algorithmic self-assembly, but 
it has also been proven that there are methods of cooperation other than the specific 
“glue cooperativity” already discussed. Other ways of causing the attachment of a tile 
to depend on two or more others, called weak cooperativity [54], have been shown in 
theoretical results using geometric hindrance [55, 56] and repulsive forces [57, 58]. 
To utilize geometric hindrance, theoretical systems with the parameter temperature 
set to 1 can be designed where tiles have shapes other than squares [55, 59, 60] (or 
in addition to squares [56]) so that the tiles that can correctly bind into a location are 
selected by matching a single glue for binding as well as having a complementary 
geometric shape (serving as the second input) that matches the second input location. 
To utilize repulsive forces, instead of relying on a complementary geometry as the 
second input, tile design can include the specific placement of a tile element which 
will experience a repulsive force when adjacent to another instance of that element 
on a neighboring tile. In this way, only a tile of a type which does not cause repulsive 
elements to align will be able to bind into a new location. 

While the aforementioned methods of weak cooperation provide a perhaps 
stronger barrier to growth errors that can occur using glue cooperation, by actively 
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preventing incorrect tile binding rather than simply not favoring it, another prob- 
lem arises in temperature 1 systems. This problem, known as spurious nucleation, 
stems from the fact that in a temperature 1 system, all glue bonds are individually 
enough to cause two tiles to bind. Algorithmic self-assembly requires growth to 
begin from a very particular state, usually from either a seed assembly (see Sect. 5.1) 
which is large enough to provide a location where one or more tiles can bind via 
two attachments to it, or a set of “hard-coded” input tiles that can bind to each other 
via sufficiently strong bonds to form an assembly that can then function like a larger 
seed assembly. From a carefully defined input and using cooperative attachments, 
tiles of relatively few different types can combine in complex algorithmic patterns 
with many copies of each tile type appearing throughout the growing assembly. 
However, in a temperature | system, growth can be initiated apart from any seed 
assembly with pairs of tiles “nucleating” growth that can then proceed to follow 
patterns corresponding to arbitrary subsets of algorithmic growth continuing from 
random inputs. This spuriously nucleated, unstructured algorithmic growth leads to 
the formation of “junk” structures and is therefore a fatal flaw for such systems. Great 
experimental work using cooperativity to control nucleation has been done in [61], 
and a schematic representation of their results (using both single-stranded DNA tiles 
and DNA origami-based tiles) is shown in Fig.9. By using two planes in the third 
dimension, the “crisscross slat” tiles are able to extend further than square tiles and 
bind to a greater number of neighboring slats when attaching. Future work leveraging 
such expanded cooperative growth to prevent spurious nucleation, especially across 
wider temperature ranges, may improve the ability to control seeding in algorithmic 
systems. 

Although “temperature 2” growth in experimental systems using glue coopera- 
tion helps restrict algorithmic growth to beginning from designated seeds, it requires 
careful design of glue domain strengths and careful control of actual system tem- 
perature and tile concentrations. Even so, these systems still suffer from growth 
errors, even after previously mentioned proofreading techniques are incorporated. 
Additionally, while weakly cooperative temperature | systems also allow for algo- 
rithmic growth and have the potential for reducing growth errors by more actively 
preventing attachments of incorrect tiles, they instead suffer more greatly from the 
problem of spurious nucleation. In order to realize the full potential of algorithmic 
self-assembly, systems designed to incorporate both types of cooperative behavior 
may be useful. For instance, if tile motifs could be designed such that geometric 
hindrance occurs in the case of glue mismatches while also providing glues to be 
used for glue cooperation enforced by temperature 2, future designs utilizing DNA 
origami-based tiles (e.g., [62]), or perhaps clever designs of smaller complexes, may 
have the potential to move forward the state of the art in algorithmic self-assembly. 
Additionally, experimental implementation of temperature 2 growth requires either 
“double-sized tiles” (such as in [10] where double-sized tiles are effectively two 
square tiles permanently bound together, allowing growth to extend outward from 
one row of growth into a new row) or the design of sets of glues with carefully sep- 
arated groups representing strength | and strength 2 glues (as previously discussed 
in Sect. 5.1), and advances in sequence design would help in this effort. Yet another 
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Fig. 9 Crisscross slat tiles (green and blue rectangles) assembling with a binding threshold of 6, 
which enforces high levels of cooperativity. Binding domains are shown as small colored squares 
(and although they are actually on the side of the green slats facing the blue slats, they are shown 
for clarity). Each attaching slat must bind to at least 6 others. The empty horizontal outline shows 
where the next slat attachment is possible, and the vertical outline shows where there will then be 6 
domains allowing for the attachment of the following slat, leading to a new location for a horizontal 
slat, etc. 


potential direction for advancement may come from further development of proof- 
reading techniques and error suppression mechanisms, which have already proven 
to be very useful. 


6.2 Single Tile or Hierarchical Growth 


The aTAM and models derived from it are based on dynamics of single-tile attach- 
ment, i.e., at each step of the assembly process, a single-tile monomer attaches to 
a growing assembly. An alternative to this allows assemblies of arbitrary size (i.e., 
composed of arbitrary numbers of individual tiles) to combine with each other. This 
is often modeled as hierarchical assembly in which a system begins self-assembly 
from a collection of individual “singleton” tiles which can combine with each other 
in pairs, and then those assemblies can combine with each other, etc., allowing up to 
a doubling of assembly size with each combination. A commonly studied theoretical 
model of this process is called the Two-Handed Assembly Model (2HAM) as it is 
based upon the intuition that one already produced assembly could be taken in each 
hand, and the pair could then be combined to form a new, larger assembly. 
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Fig. 10 Abstract example of hierarchical assembly (showing a portion of the infinite Robinson 
tiling pattern). From left to right, smaller assemblies combine to form larger assemblies, which then 
combine with each other to form the next hierarchical level 


Hierarchical assembly occurs in biology (1.e., the constituent pieces of amino acids 
combine, then those amino acids are combined to form proteins, and the proteins 
then combine to form cellular structures) and has even been cleverly demonstrated in 
DNA-based experimental systems (e.g., [40, 63, 64]). Theoretical work has shown 
that, in general, 2HAM systems are capable of making a greater diversity of struc- 
tures and utilizing lower tile complexity than systems in the aTAM [65]. However, 
somewhat counterintuitively, in [66] it was proven that (under physically realistic 
assumptions based on molecular counts applied to the abstract 2HAM) no asymp- 
totic speedup is actually achievable over single-tile growth. Nonetheless, the 2HAM 
remains a very interesting model in which the dynamics allow for the theoretical 
designs of systems which efficiently (in terms of tile complexity) produce complex 
shapes. System design in these theoretical constructions tends to make heavy use of 
geometric hindrance, where the interfaces along which pairs of assemblies may bind 
have carefully designed patterns of “bumps” and “dents” that allow for great discrim- 
ination between which pairs of assemblies can bind to each other, while allowing the 
numbers of unique glue domains to remain very low (often a relatively small constant 
number across constructions capable of targeting any particular structure among an 
infinite collection). This has been demonstrated in theoretical results [67—69] as well 
as experimentally [62, 63, 70] (Fig. 10). 

For future experimental work to implement additional theoretical constructions, 
a wide variety of improvements will most likely be necessary. To leverage the use of 
geometric hindrance, it will be necessary for assemblies to remain rigid, at least along 
binding surfaces, but in many constructions those interfaces may be quite long. With- 
out sufficient rigidity, portions of the assembly which should block the attachment of 
incorrect assemblies may bend to allow those attachments. Prior experimental work 
with hierarchical assembly [40] showed a relatively sharp drop-off of the rates of cor- 
rect completion of steps of the assembly process. This seriously restricts the potential 
complexity of designed systems and efforts to improve that would be valuable. For 
instance, as the previously mentioned theoretical work of [66] showed, a roadblock 
to the assembly of later steps can be the multitude of assemblies of earlier steps (com- 
plete and/or incomplete) that simultaneously exist in solution. As steps progress, the 
number of assembly types, or species, quickly grows since not all growth progresses 


258 M. J. Patitz 


at the same rate, and this makes the likelihood of a pair of complementary assemblies 
of a later step encountering each other drop precipitously. Future improvements in 
the ability to relatively quickly, easily, and correctly purify the products of various 
steps may allow for a higher concentration of correctly matching assemblies from 
the same step and allow assembly to progress correctly at higher rates. 


6.3 Activatable/Deactivatable Glues 


In the aTAM and many similar models, the tiles are “static,” meaning they can be 
thought of as components whose properties do not change once they bind to an assem- 
bly or at any time afterward. However, many DNA-based nanotechnologies are based 
largely upon dynamic reactions such as strand displacement [71-73]. When strand 
displacement mechanics are incorporated into tile-based self-assembly, it is possible 
to make tiles whose binding domains turn “on” and “off.” This has been experimen- 
tally prototyped [74] and theoretically modeled [75-77], with tiles developed such 
that the binding of one glue on a tile can cause other glue domains on that tile to 
either become “active” (i.e., they were previously sequestered but then uncovered) 
or “inactive” (i.e., they go from either bound or able to bind to being sequestered 
such that they can no longer bind and any bond they previously formed with another 
tile is broken). See Fig. 11 for an example. 


a'b'c' edb 


Fig. 11 Example of a “signal-passing” tile in two levels of abstraction. Assembly proceeds from 
the bottom to the top. On the left, the green tile initially has its left glue partially sequestered, but the 
unpaired c’ domain provides an additional matching domain that allows it to bind more favorably 
with the glue on the yellow tile. That causes the previously bound strand to detach, which is then 
able to bind with the top strand on the right side, exposing the right-side glue. On the right, a further 
abstraction depicts the same process: The e glue of the green tile is initially “inactive”. When the 
c' glue binds, it causes the “firing” of the signal which eventually “activates” the e glue 
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Theoretical constructions with so-called signal-passing tiles have shown that not 
only can they self-assemble structures whose shapes are impossible to self-assemble 
with static tiles (e.g., the discrete self-similar fractal called the Sierpinski triangle 
cannot self-assemble in the aTAM [78], but it can self-assemble using signal-passing 
tiles [75]), but they can also perform universal computation without requiring the 
assembly to be as large as the product of the time and space requirements of the com- 
putation. That is, with static tiles, all steps of the computation must be permanently 
represented within the final assembly, so an assembly in which a computation occurs 
using n bits in each of m computation steps requires n - m tiles to be permanently 
attached. However, with signal-passing tiles it is possible for the glues of tiles to 
deactivate after they have participated in a step of a computation and thus for tiles to 
detach after facilitating a computational step and for assemblies to remain smaller 
while performing computations. The demonstrated theoretical power of systems of 
signal-passing tiles is in several ways greater than that of the aTAM, and although 
many constructions make use of tiles which have high signal complexity (i.e., many 
signal pathways across the same tile), theoretical work has also shown that by scaling 
up target shapes [79], signal complexity can often be brought down to only 2 signals, 
allowing for relatively simple tiles to exhibit the greatly enhanced power of signal 
passing. 

Although some experimental work has been done with signal-passing tiles [74], 
in that work, only a single signal passed across each tile and glue deactivations 
were not used. In order to expand the use of signals, larger tile motifs will likely 
be required, but (small) DNA origami structures could potentially provide a good 
platform. The process of passing signals from one glue to another when the first binds 
could be implemented using techniques similar to those of “surface chemical reaction 
networks” [80-82] where strand displacement cascades are used to transmit the 
signals. Although the complexity of individual tiles implemented in this way would 
be much greater than simple single-stranded tile (SST) motifs (i.e., tile designs which 
use a single strand of DNA per tile), if even an additional fraction of the algorithmic 
control possible with signal-passing tiles could be realized that increased complexity 
has the potential to be justified. Furthermore, using DNA origami as the tile body 
also provides the potential for integrating geometric hindrance as a tool, adding even 
more control and error suppression to algorithmic growth. 


6.4 Tile Removal and Breaking of Assemblies 


The ability for tiles that previously joined together in an assembly to detach from 
each other at designated points allows for not only new dynamics but also for new 
categories of targeted behaviors. For instance, it becomes possible to develop theo- 
retical systems which take as input a structure that already has the desired shape and 
then to produce assemblies having that same shape [41, 43], or to replicate patterns 
encoded into assemblies [42, 53]. It also becomes possible to design theoretical sys- 
tems capable of attaching to the perimeters of input assemblies if and only if they 
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match a particular shape [44]. Experimental work has even succeeded in showing 
how the fracturing of assemblies can serve as the basis for the replication of patterns 
[53]. 

Theoretical models that allow for glue detachment include signal-passing tiles (see 
Sect.6.3), the melting of subsets of weaker glue bonds via increased temperatures 
(see Sect. 5.4), and the dissolution of a subset of tiles within an assembly (for instance, 
in systems with tiles made of both DNA and RNA, the RNA-based tiles could be 
dissolved via an RNase enzyme [43]). 

Development of systems leveraging the additional possibilities enabled by tile 
detachment and the breaking apart of assemblies will require overcoming the hurdles 
discussed for robustly implementing signal-passing tiles or temperature program- 
ming, or techniques such as incorporating RNA-based tiles into systems with DNA 
tiles and successfully dissolving them while leaving the DNA tiles intact. Also, for 
many of the theoretical constructions, greater control of hierarchical self-assembly 
will be required. 


6.5 Reconfiguration Via Flexibility 


When cellular machinery builds the wide variety of proteins encoded by genes, even 
though only a small number of amino acids are used as the building blocks, the 
diversity of protein shape and functionality that results are astonishing. Since we 
know the sequences of the genes and their mappings to the amino acid sequences, it 
may seem that it should be easy to predict those properties of proteins. However, as 
amino acids are attached one at a time, the forming chain folds upon itself in a complex 
three-dimensional pattern influenced by several types of molecular interactions. This 
process turns out to be computationally intractable to predict in general [83]. In 
contrast, DNA origami utilizes a rational design approach toward folding which 
starts with the desired shape to self-assemble and then develops a routing path for a 
scaffold strand that can then be folded into that path by staple strands. 

Following nature’s example, a cotranscriptional approach to utilizing folding with 
tiles based on RNA has been developed [84, 85]. A generalization of this process 
has been captured in the theoretical model called oritatami [86] (see Fig. 12 for an 
example), which has been shown to allow for universal computation [87] and have 
strong shape-building abilities [88]. While RNA seems to be the natural medium for 
such systems, perhaps some future DNA-based work could use related techniques. 

A different approach has been taken by theoretical models [89, 90] which have 
been developed to at least partially mimic and capture similar folding behaviors, and 
unsurprisingly, it is intractable to compute most interesting properties of systems in 
these models, even despite their more discrete nature. For instance, tiles in the Flexible 
Tile Assembly Model (FTAM) [89] are considered to be rigid bodies, but they are 
allowed to have flexible bonds with their neighbors. The physical inspiration for the 
theoretical FTAM is the way that protein folding can allow chains of amino acids to 
rapidly explore possible configurations and adopt those that are (relatively) optimal. 


Implementing a Theoretician’s Toolkit for Self-Assembly with DNA Components 261 


Fig. 12 Example of an oritatami system, where a chain of “beads” is transcribed, one at a time. (In 
this case, from a to f.) Each bead type can be designed to be able to form bonds with some subset 
of other bead types. As beads are transcribed, the chain folds (on the triangular grid, a portion of 
which is shown in red) to form a maximal number of bonds (shown here as dashed green lines). On 
the left, the chain has not folded to maximize bonds, but on the right it has 


ee ee 


Fig. 13 Example of a reconfigurable assembly in the flexible tile assembly model. The glues 
between square tiles are flexible, similar to hinges, and allow the flat structure on the left to fold 
into the hollow box on the right 


For reconfigurations that are not excessively large, it seems likely that this process of 
reconfiguration and exploration can proceed more quickly than bimolecular reactions 
which require the diffusion of new monomers for attachment, and that perhaps even 
displacement and reconfiguration of previously bound subassemblies may be possible 
to engineer, enabling shape-changing assemblies (Fig. 13). 

A potential DNA-based implementation could achieve flexible bonds between 
tiles by including unpaired nucleotides on one or both sides of glues which have bound 
(with the bound portions forming rigid helices). This could allow for bound tiles or 
even subassemblies to change positions relative to other portions of an assembly, 
and thus, it may be possible to design algorithmic self-assembling systems which 
form reconfigurable assemblies that can be designed to first take one shape and 
can then reconfigure into a differently shaped assembly by the addition of just a 
few strands that displace targeted glue strands, or different environmental signals 
such as the concentration of a particular molecule (like MgCl, as was demonstrated 
experimentally in [91]) or pH (as shown experimentally in [92]) (Fig. 14). 


6.6 Assembly Growth Controlled by CRNs 


Chemical reaction networks (CRNs) are composed of sets of reactions, each of which 
has a set of reactant chemical species that react to produce a set of product chemical 
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A+B—>X+X 
A+X—>A+A 
B+X—>B+B 


Fig. 14 Example CRN whose input is a set of A and B molecules. Each molecular species is 
denoted by a different letter. Each reaction is specified by a different line. For example, in the first 
reaction one copy of an A molecule combines with one copy of a B molecule to produce two copies 
of the X molecule. This CRN computes which input is in the majority. The system evolves to its 
output by increasing the count of the majority input while decreasing that of the minority input, 
since the reactant in the majority will be more likely to interact with both the Xs and the other 
reactant 


species. A set of such reactions which are chained together by having the outputs of 
one reaction act as the inputs of another can define a network capable of complex 
behaviors. Theoretical work has shown that arbitrary CRNs can be implemented as 
sets of DNA complexes [93] and that has led to an entire branch of DNA nanotechnol- 
ogy based upon the design of artificial CRNs, including programming languages that 
compile digital circuits into DNA complexes [94]. While the goal of such systems 
is typically centered around the integration of computing logic with chemical and/or 
biological systems rather than structure building, there has also been research which 
ties the two together. Although tile assembly can also be described by chemical reac- 
tions that model the combination of an assembly and a tile to form a larger assembly, 
the geometry of the forming structure helps define which tiles may attach. Also, tiles 
are neither transformed or consumed (at least in models such as the aTAM). More 
general CRNs do not consider geometry of structures and also allow for reactants 
to be consumed and/or converted into other species (while perhaps also consum- 
ing “fuel” species and creating “waste” species). The combination of DNA-based 
implementations of these more general CRNs with tile assembly systems (theoret- 
ically [95-97] and experimentally [98]) provides the ability to have the growth of 
assemblies controlled by computations performed by a set of general CRN reactions 
that can be based upon time delays, the presence or lack of specific inputs, or even 
feedback based on the growth of assemblies themselves by adjusting concentrations 
and/or counts of tiles used during the assembly process. The “signals” produced by a 
CRN in this case are global in nature, potentially influencing any or all of the assem- 
blies growing in parallel, while the control provided by the signals of signal-passing 
tiles (see Sect. 6.3) is local in nature, impacting only the growth of the assembly on 
which a signal is initiated. 

As the development of both DNA-based CRNs and tile-based self-assembly sys- 
tems continues to mature, there is great potential for control of structure-forming 
systems by CRNs whose input can be delivered by a wide array of mechanisms, 
including (but not limited to) the presence of targeted molecules in the environment. 
Combined with reconfigurable assemblies (see Sect. 6.5), systems could be designed 
to release cargoes, expose previously sequestered functionalized surfaces, or perform 
other environmentally responsive behaviors. 
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7 Conclusion 


We have summarized a wide variety of theoretical models of self-assembling systems 
that were primarily developed to provide high-level mathematical abstractions and 
give insights into the effects of varying aspects of components (e.g., sizes, shapes, 
rigidity, binding affinities, etc.) and model dynamics (e.g., methods of growth and/or 
breaking of assemblies, cooperativity, etc.). Some of these insights have already pro- 
vided guidance to experimental designs, and we hope the models will continue to 
evolve and mature alongside the design and engineering techniques of DNA nan- 
otechnology. Theoretical modeling can provide a framework that shows which prop- 
erties of components and systems are needed for desired resultant behaviors and guide 
researchers in the right direction as they work to develop new molecular components 
and techniques. Additionally, it can serve as a foundation to categorize potential 
behaviors of newly developed components and dynamical behaviors made possible 
in the laboratory. 

There is a symbiotic relationship between theory and experiment, and thus it 
also remains important that theory incorporates up-to-date knowledge of experi- 
mental roadblocks and challenges, which can then be used for the development of 
new models and theoretical studies. The rapid growth and great success of DNA 
nanotechnology have been achieved in part due to strong ties between theory and 
experiment, and conferences like the “International Conference on DNA Computing 
and Molecular Programming” [99] have been integral in building and maintaining 
this connection. We look forward to seeing where future developments will lead and 
are optimistic that many of the powers of self-assembling systems displayed within 
the theoretical domain will be realized in physical systems, and this theoreticians’ 
toolkit for building self-assembly systems will come closer to reality. 
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Reasoning As If A) 


Check for 
updates 


Jack H. Lutz and Robyn R. Lutz 


Abstract Itis occasionally useful to reason as if something were true, even when 
we know that it is almost certainly not true. We discuss two instances, one in dis- 
tributed computing and one in tile self-assembly, and suggest directions for further 
investigation of this method. 


1 Introduction 


Great breakthroughs in science have a way of perforating traditional disciplinary 
boundaries. Ned Seeman’s development of structural DNA nanotechnology [1, 2] 
is a case in point. Over the past forty years, researchers from most disciplines of 
science and engineering have been drawn into the thrilling creativity of this new field. 
This is most obviously motivated by opportunities for applying methods from many 
disciplines to the development of DNA nanotechnology. Often, however, benefits 
of such applications also accrue back to the disciplines from which they come, as 
the novel aspects of DNA technology force improvements of these methods and our 
understanding of them. 

In this note, we discuss an example of this phenomenon, the use of a method 
from distributed computing theory [3] in the theory of DNA tile self-assembly [4— 
6], perhaps the earliest abstract theory to emerge from DNA nanotechnology. Both 
of these subjects deal with systems that evolve non-deterministically over time and 
are so complex that it is not feasible to explore all the individual trajectories along 
which they might evolve. In the case of distributed computing, the example of inter- 
est here consists of a large number of processors that asynchronously and non- 
deterministically pass messages to one another. In the case of tile self-assembly, the 
example of interest consists of a large two-dimensional structure that grows via the 
asynchronous, non-deterministic adsorption of various types of tiles (abstractions of 
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DNA double-crossover molecules [7]) onto various locations along the boundary of 
the structure. In both examples, the population (number of processors or number of 
tiles) is assumed to be very large, and the number of trajectories of the system is far 
larger than its population. 

The method that we discuss, which we call reasoning as if, has not been precisely 
formalized, and we will not succeed in formalizing it here. Our goal is simply to 
discuss two striking examples of the method, one in distributed computing and one 
in tile self-assembly, and to suggest directions for future research on the method. 

The rest of this note is organized as follows. In Sect.2 we review the snapshot 
algorithm of Chandy and Lamport [8], emphasizing the reasoning-as-if nature of this 
algorithm. In Sect.3 we review the local determinism method of Soloveichik and 
Winfree [9], again emphasizing the reasoning-as-if nature of the method. In Sect. 4 
we sketch research directions that we hope will lead to the further development of 
as-if reasoning as a useful method. 


2 The Snapshot Algorithm 


The snapshot algorithm was designed by Chandy and Lamport [8], and a colorful 
description of it, which we follow here, was provided by Dijkstra [10]. In it the 
algorithm assembles a “snapshot” of a possible but unlikely global system state in 
order to reason about the properties of the system. What makes this algorithm relevant 
to our focus here is that the system snapshot that is assembled is almost certainly not 
real. It is an imaginary state that probably never occurred in the system’s execution. 
Its power lies in its uncanny and provable mirroring of the global properties of the 
system as if the snapshot were real. 

Chandy and Lamport use the analogy of photographers watching a sky filled with 
migrating birds to explain the algorithm [8]. The scene is vast, and the birds are in 
constant motion-no single photo suffices. Only a composite photo from snapshots 
taken by different photographers at different times and then thoughtfully pieced 
together can capture the whole scene. 

Similarly, the snapshot algorithm uses as-if reasoning to enable us to determine 
properties of the global state of a distributed system. As the algorithm executes, it 
assembles a description of a snapshot state. When it terminates, we can query the 
snapshot state. Any stable predicate that holds in the assembled snapshot state holds 
for the system. An example of a stable predicate, from [11], is “the number of tokens 
traveling the network equals 7.” A stable predicate is one that, once it holds, holds 
forever. 

A network is represented as a directed, finite, strongly connected graph in which 
each machine is a node and each edge is a first-in first-out buffer. A computation 
is specified by a sequence of atomic actions by individual machines. Each action 
updates the machine’s state, accepts at most one message on each of its input buffers, 
and sends at most one message on each of its output buffers. A global system state 
is determined by the state of each machine and the messages in each buffer. 
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The snapshot state is assembled from a snapshot of each machine, taken locally 
and at different times, and the assembled snapshot state is a fiction. However, any 
stable predicate that is true for the snapshot state is true for the system’s final state. 

Coloring. Initially, all nodes (machines) and edges (messages) are white. At ter- 
mination, all machines and messages are red. Each machine turns red once, which 
Dijkstra calls “blushing,” sending white messages before turning red and red mes- 
sages after turning red [11]. No red message is ever accepted by a white machine. 

Assembling the Snapshot State. The snapshot state consists of the state of each 
machine as it turns red and the white messages accepted by the machine after it turns 
red. We designate one machine as the seed. It turns itself red and sends a special 
message, called a Marker, on each of its output buffers. It then records any white 
messages it receives while red. It knows that its local snapshot is complete when it 
has received a Marker on each of its input buffers. Each subsequent machine, when 
it receives a Marker on an input buffer, similarly turns red and sends a Marker on 
each of its output buffers. Since each machine is reachable, all machines turn red in 
finite time. 

Rewriting History. The algorithm rewrites history—“by magic,” according to 
Dijkstra—so that the snapshot state is the point at which all machines turn red. To 
do this, whenever there is a red action followed by a white action, they must be 
from different machines, and it interchanges them, iterating until all white actions 
precede all red actions. The snapshot state is the cut in this rewritten history, that is, 
the point at which all subsequent actions are red and all prior actions are white. The 
snapshot algorithm thus uses local snapshots taken at each machine to construct an 
imaginary but possible global state that enables us to determine system properties 
as if the snapshot state were real. 

Uses. In the nearly 30 years since Dijkstra saluted the snapshot algorithm as “beau- 
tiful,” it has been widely used to check global properties in distributed systems, e.g., 
a computation terminating, as well as for monitoring and debugging [8, 12]. For 
example, checkpointing and rollback recovery are key mechanisms of fault-tolerant 
distributed systems [13]. They enable systems that crash to recover and resume execu- 
tion from a stored state rather than needing to restart from scratch. In checkpointing, 
each device in the system stores its local state, so that in rollback recovery after a 
failure, the node’s state can be restored. Nakamura et al. point out that checkpoint— 
rollback recovery “inherently contains a snapshot algorithm to record the nodes’ 
checkpoints in such a way that they compose a consistent global state” [14]. 


3 Local Determinism 


The abstract Tile Assembly Model (aTAM) is an idealized model of molecular self- 
assembly on a two-dimensional surface. Winfree et al. [15] introduced the first form 
of the model, which was based on DNA double-crossover molecules [1] and was 
already Turing universal (i.e., able to simulate arbitrary computations). Winfree [4] 
developed the model further, and Rothemund and Winfree [5, 6] refined it to its 
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present-day form. The aTAM has been an active area of research in the molecular 
programming community ever since. 

A tile type in the aTAM is a unit square that cannot be rotated, so that it has 
well-defined “north,” “west,” “south,” and “east” sides. Each of these sides is labeled 
with a glue type that we may take to be a non-negative integer. Each glue type has a 
strength that is 0, 1, or 2. 

A tile assembly system (TAS) is an ordered pair 7 = (T, s), where T is a finite 
set of tile types and s € T is the seed tile type. An assembly sequence of T is a finite 
or infinite sequence 

æ = (Qo, 1, 2,...) 


of assemblies a; satisfying the following conditions. 


1. ap is the assembly consisting of a single tile of type s. 

2. For each i, if the assembly sequence œ is long enough for a+; to exist, then 
aj+1 is obtained by adding a single tile of some type t € T to the assembly a;. 
Moreover, this tile is added by abutting it to one or more tiles of œ; in such a way 
that the sum of the strengths of the glues on the tile whose types match the glue 
types of a; that they abut is at least 2. 

3. Ifa = (ao, ..., a;) is finite, then the assembly œ; is terminal in the sense that no 
tile of any type t € T can be added to œ; as in condition 2. 

4. If æ = (ao, a1, @2,...) is infinite, then it is (strongly) fair in the sense of dis- 
tributed computing [16, 17], which means that, if a tile can be added to an assem- 
bly a; at some location /, then there is some j > i such that g j+; is obtained from 
a; by adding a tile at location /. 


Several things should be noted about the above definition. First, in any assembly 
sequence œ = (œo, @1, 2,...), each assembly a; consists of exactly i + 1 tiles. 
Second, condition 3 requires an assembly sequence œ to “go on for as long as it 
can.” Third, once tiles are added to an assembly, they do not move. This ensures 
that every assembly sequence æ has a well-defined result, which is an assembly 
denoted res(a). If a = (ao, ..., @;) is finite, then res(@) is the finite assembly q;. If 
a = (a, ..., @;) is infinite, then res(œ) is the minimal infinite assembly that has each 
a; as a subassembly in the obvious sense. Fourth, the fairness condition 4 ensures that 
the result of an infinite assembly sequence must, like the result of a finite assembly 
sequence, be terminal in the sense defined in condition 3. 

Many aTAM investigations involve design problems. In such a problem, there is a 
target assembly &œ*, and the task is to design a tile assembly system 7 = (T, s) such 
that every assembly sequence æ of T has a* as its result. Moreover, œ* is often very 
large, and it is typically desirable to (i) have the number |T| of tile types in 7 be 
much smaller than the number of tiles in w* and (ii) exploit the inherent parallelism 
of chemistry by having it be typical for an assembly a+; in an assembly sequence a 
for a* to be just one of many assemblies that could have been obtained by adding a 
tile to a;. These two things together imply that a solution 7 = (T, s) for the design 
problem of causing a* to reliably self-assemble typically spawns an exceedingly 
large number of assembly sequences a, all with result a*. 
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In fact, anyone who has designed non-trivial tile assembly systems knows all too 
well how easy it is to design a tile assembly system 7 in which the target assembly a* 
results from many or most, but not all assembly sequences æ of 7. When erroneous 
designs of this type occur, it is typically because the designer envisions a particular 
assembly sequence with result a*, but some alternative assembly sequence a’ with 
a different result can occur (often because of a “race condition” in which growth in 
a’ can block envisioned growth in æ). 

The local determinism method of Soloveichik and Winfree [9] is a verification 
method that is also a beautiful solution to the above design problem. The key to 
this method is a definition of what it means for an assembly sequence « of a tile 
assembly system 7 to be locally deterministic. The details of this definition need 
not concern us here except to note its key properties. The definition is subtle, but it 
is simple enough that it is typically routine to verify that a given assembly sequence 
a is locally deterministic (if it is). The definition does not require the underlying tile 
assembly system 7 to be deterministic: A locally deterministic assembly sequence 
æ is typically just one of a huge number of assembly sequences that can occur in T. 

The remarkable main theorem about local determinism is that, if œ is a locally 
deterministic assembly sequence of a tile assembly system 7, then every assembly 
sequence a’ of T has the same result, i.e., satisfies res(w’) = res(@) [9]. 

This local determinism theorem says that a designer of a tile assembly system 7 
for a target assembly œ* who envisions a particular tile assembly sequence a of T 
with result w* may safely reason as if æ is the assembly sequence that occurs, even 
if the likelihood of æ occurring is vanishingly small, provided only that the designer 
also verifies that æ is locally deterministic. Since it is natural for designers to think 
in terms of an envisioned assembly sequence (“First this substructure will assemble, 
then this one ....””), local determinism is a very designer-friendly method. Moreover, 
this friendliness arises directly from the fact that the method entitles the designer to 
reason as if a convenient, envisioned, and unlikely assembly sequence occurs. 

To put the associated verification problem fancifully, if one wants to prove that all 
roads (assembly sequences) lead to Rome (the target assembly), it suffices to exhibit 
a fancy (locally deterministic) road and show that this leads to Rome. 

But even more is true. Soloveichik and Winfree’s proof also shows that, if there 
is a fancy road, then all roads are fancy. That is, if a tile assembly system 7 has 
a locally deterministic sequence a, then all assembly sequences of T are locally 
deterministic. The method is thus even more designer friendly than indicated above, 
since any assembly sequence that the designer chooses will be locally deterministic 
(if some assembly sequence is). 


4 The Future of As If 


Structural DNA nanotechnology opened up the fascinating field of molecular pro- 
gramming. The challenges of this new world have pushed participating computer sci- 
entists to develop new techniques for dealing with complex interactions of geometry 
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and non-determinism at scales surpassing prior computing experience and rivaling 
those of the coming Internet of Things. One of these techniques, local determinism, 
is an instance of reasoning as if, a method rarely used in science and still not well 
understood. Here, we have reviewed local determinism together with the snapshot 
algorithm, an earlier instance of as-if reasoning, in the hope of promoting investiga- 
tion of this method. 

We briefly suggest future directions for research on as-if reasoning in molecular 
programming. First, can local determinism be usefully adapted from the abstract Tile 
Assembly Model to the other models of molecular programming? Obvious candi- 
dates here include other models of tile self-assembly such as the kinetic Tile Assem- 
bly Model (kTAM) [4] and the Two-Handed Assembly Model (2HAM) [18]. More 
ambitious possibilities include chemical reaction networks (CRNs) [19], the CRN- 
directed Tile Assembly Model (CRN-TAM) [20, 21], and thermodynamic binding 
networks [22]. 

More general questions also present themselves. Without delving too deeply into 
philosophy [23], can we more clearly formalize as-if reasoning? A concrete start 
would be to formalize what it is that the snapshot algorithm and local determinism 
have in common. 

Finally, local determinism is, roughly speaking, a condition that allows us to rea- 
son from the possibility of a target structure self-assembling to the necessity of that 
target structure self-assembling. Modal logic, originally developed to reason about 
possibility and necessity [24], has become a powerful tool in distributed comput- 
ing [25, 26]. Could it provide a means of formalizing—and strengthening—as-if 
reasoning? 

In his book, Ned graciously wrote of molecular programming researchers that 
“more than any other, this community has contributed valuable ideas and workers 
to the field of structural DNA nanotechnology,” and that “this parallel field is one of 
the key drivers of structural DNA nanotechnology” [2]. We are delighted to work in 
this parallel field, which would not exist without Ned, but we know that he is in the 
driver’s seat, and that is the way we want it. 
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Biochemical Circuits 


Scaling Up DNA Computing A) 
with Array-Based Synthesis Chop tor 
and High-Throughput Sequencing 


Yuan-Jyue Chen and Georg Seelig 


Abstract It was 40 years ago today, when Ned taught DNA to play [32]. When Ned 
Seeman began laying the theoretical foundations of what is now DNA nanotechnol- 
ogy, he likely did not imagine the entire diversity and scale of molecular structures, 
machines, and computing devices that would be enabled by his work. While there 
are many reasons for the success of the field, not least the creativity shown by Ned 
and the community he helped build, such progress would not have been possible 
without breakthroughs in DNA synthesis and molecular analysis technology. Here, 
we argue that the technologies that will enable the next generation of DNA nan- 
otechnology have already arrived but that we have not yet fully taken advantage of 
them. Specifically, we believe that it will become possible, in the near future, to 
dramatically scale up DNA nanotechnology through the use of array-synthesized 
DNA and high-throughput DNA sequencing. In this article, we provide an example 
of how DNA logic gates and circuits can be produced through enzymatic processing 
of array-synthesized DNA and can be read out by sequencing in a massively parallel 
format. We experimentally demonstrate processing and readout of 380 molecular 
gates in a single reaction. We further speculate that in the longer term, very large- 
scale DNA computing will find applications in the context of molecular diagnostics 
and, in particular, DNA data storage. 


1 Introduction 


Over the last four decades, structural DNA nanotechnology became intertwined with 
and provided a foundation for a range of other fields, most notably dynamic DNA 
nanotechnology and DNA computing. Adleman’s original demonstration of DNA 
computing took advantage of the inherent parallelism of chemical reactions and the 
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predictability of DNA base pairing to solve an instance of the traveling salesman 
problem [1]. However, it soon became apparent that DNA computing would not 
be able to compete with electronic computers because of the exponentially large 
amounts of DNA required for solving problems out of reach of conventional com- 
puters [11]. Researchers subsequently explored alternative computing paradigms, 
from tile-assembly systems, an area that Seeman made foundational contributions 
to [3, 29, 34, 43], to Boolean logic gates [25, 31, 37, 38], neural networks [13, 20, 
27, 42, 45] or chemical reaction networks [6, 7, 33]. Importantly, the goal of this 
more recent work is not to compete with electronics but to efficiently process infor- 
mation that is already available in molecular form or to use embedded computation 
to organize molecules into shapes and patterns. 


1.1 Scaling up DNA Computing for Molecular Diagnostics 


The potential of DNA computing to analyze information encoded in biological 
nucleic acids, in particular single-stranded RNA, was recognized early on [5, 31, 
37]. Benenson et al. developed a molecular automaton that could perform a compu- 
tation where the outcome (the release of an antisense drug mimic) was dependent on 
the absence or presence of specific inputs (ssDNA with sequence analogous to diag- 
nostically relevant mRNA) [5]. In our own recent work, we constructed a DNA-based 
molecular classifier that assigns samples to a class (bacterial or viral infection) based 
on the levels of seven distinct mRNA mimics [21]. Zhang et al. extended this work 
by including an amplification step that made it possible to work with actual patient 
samples rather than synthetic or in vitro transcribed RNA thus bringing molecular 
computation for diagnostics closer to a practical application [44]. However, classifi- 
cation accuracy could be further improved by considering more input features (1.e., 
distinct mRNA). In fact, an optimal in silico classifier designed to solve the same 
problem as our molecular classifier takes into account the levels of over two hundred 
distinct transcripts [41]. Moreover, in silico classifiers designed to solve more com- 
plex computational problems such as identifying a cancer tissue of origin routinely 
take thousands of genes as inputs [22, 28]. It is easy to imagine that future diagnostic 
applications of molecular computation could benefit from technologies that would 
allow the construction and analysis of circuits with thousands of gates. 


1.2 Scaling up DNA Computing for DNA Data Storage 


Digital data will soon outgrow available storage capacity and novel approaches for 
data storage are required. DNA is a promising material for digital data storage because 
of its high information density and durability (reviewed in Ref. [9]). Documented 
suggestions for the use of DNA for digital data storage go as far back as the 1960s [23], 
but only recent improvements in DNA synthesis and sequencing have started making 
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DNA storage practical [14, 17, 18, 24]. The amount of data stored in DNA increased 
from 0.7 MB in 2012 to 200 MB in 2018 and will keep growing at an accelerating 
pace given the interest in this technology from both industry and academia. However, 
having large volumes of data stored in DNA creates new challenges, namely the need 
to access and search these data or perform other forms of information processing. 
Sequencing all stored data and converting it back to electronic form for processing are 
impractical and will not scale. A molecular computation approach for performing 
data analysis directly in the chemical realm, i.e., “near data,’ before converting 
results to electronic form, is a more promising alternative [8]. However, although 
some problems such as random access [24], image similarity search [4, 36], image 
preview [40] and some types of Boolean search [2] can be mapped to hybridization 
reactions and thus be implemented at relatively large scale already, other tasks such as 
image classification will likely require more complex computational primitives and 
orders of magnitude more computational elements than are currently feasible [13]. 


1.3 Limitations of Current Approaches to DNA Computing 


Almost all DNA circuitry built to date has been assembled from individually column- 
synthesized oligonucleotides, because this technology provides a low synthesis error 
rate and control over the concentration of individual strands. Having a low rate of 
deletions, insertions or substitutions are desirable because such errors can result in 
side reactions (i.e., “leaks’’) once the strand is incorporated into a gate [12, 35, 46]. 
Control over concentration is necessary because most DNA gates are assembled from 
multiple strands with a defined stoichiometry. However, at a cost of $1,000 or so for 96 
oligos synthesized in a 96-well plate format, and the need to manipulate oligos indi- 
vidually, any approach based on ordering column-synthesized oligos cannot scale. 

A second, related challenge for scaling up DNA circuits is the time-consuming 
step of gate assembly and purification. Currently, each gate complex needs to be 
assembled separately; otherwise signal strand supposed to be bound to an upstream 
gate may bind to downstream gate instead. Moreover, gate complexes often need 
to be gel purified to remove excess strands or partial complexes, which can cause 
undesired leak reactions and computation error. 

A third limitation to scaling up DNA circuit computation comes from the use of 
fluorescence-based reporters for reading out the results of a computation. Typical 
commercial fluorometers typically read up to four distinct fluorescence channels and 
a similar number of samples in parallel. Plate readers allow monitoring of up to 96 
reactions in parallel but are typically limited to reading no more than two fluores- 
cence channels simultaneously. These limitations mean that only a small number of 
variables can be monitored in a given computation (i.e., as many as there are inde- 
pendent fluorescence channels), and only, a limited number of computations can be 
performed in parallel (i.e., as many as there are separate reaction chambers). 

Other limitations such as low reaction speed will be briefly discussed in the dis- 
cussion, but addressing them is not the focus of the work presented here. 
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2 A Vision for the Future 


Pools of array-synthesized DNA (reviewed in [19]) provide a promising alternative to 
individually column-synthesized DNA because cost per oligo is orders of magnitude 
lower and pools with over a million distinct sequences are now available commer- 
cially. Such very large-scale and (relatively) low-cost pools provided the foundation 
for breakthroughs in DNA data storage [14, 17, 24] but so far, array-synthesized 
oligos have not been used for molecular programming applications because of the 
lower synthesis quality, variation in concentration between oligos, and the low yield 
of each individual oligo in the pool. 

Similarly, next-generation DNA sequencing can be used to read out hundreds of 
millions of DNA sequences in parallel but so far has not been used to read out DNA 
computations because current gate architectures are not compatible with read out by 
sequencing. Specifically, whether an output strand was released from a gate or is still 
hybridized to it is unlikely to impact whether it is detected in a sequencing reaction 
and thus naive sequencing of a DNA circuit reaction mix cannot reveal the result of 
the computation. 

Building on these technologies is a necessity if we are to scale up the complexity 
and size of DNA nanostructures and circuits. However, doing so will require novel 
gate and structural architectures that are compatible with low abundance and low 
quality DNA. We expect that enzymatic processing will be important in order to 
selectively amplify the array-synthesized DNA and process it into functional gates 
or other devices. 

We note that the idea to use array-synthesized DNA for DNA computing is at 
least a decade old. Qian and Winfree outlined how a Seesaw gate could be made 
through enzymatic processing of a single DNA hairpin synthesized on an array 
[26]. The proposed approach ensures correct stoichiometry of the two components 
strands in the final gate and also provides some degree of sequence proof-reading 
due to the enzymatic cleavage step. However, gate concentration is bounded by 
that of intact oligonucleotides in the pool and performance is likely limited by leak 
reactions involving gates derived from truncated or otherwise imperfect oligonu- 
cleotides. 

Here, we introduce an alternative approach for making general purpose DNA logic 
gates from array-derived DNA. Our approach uses PCR amplification to selectively 
amplify full length but low abundance DNA. We further show how DNA sequencing 
can be used to read out the result of a computation thus providing a path toward 
reading out a very large number of gate operations in parallel (Fig. 1). 


3 Results 


In this section, we first introduce our logic gate design and reaction mechanism and 
the explain how gates can be processed from single-stranded DNA through PCR 
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Fig. 1 Overview of circuit construction, computation, and read out. Left: DNA gates are derived 
from large pools of oligonucleotides through PCR amplification and enzymatic processing. Middle: 
computation occurs in a pooled reaction upon addition of inputs. Right: end point results of the 
Boolean logic computation are read out in parallel using next-generation sequencing (NGS) 


amplification and nicking enzyme digestion. We then provide experimental results 
for gate operation and demonstrate readout of up to 380 gates in a single reaction 
using DNA sequencing. 


3.1 Nicked Double-Stranded DNA Gates Reaction 
Mechanism 


We use a modified version of the nicked double-stranded DNA (ndsDNA) gate archi- 
tecture proposed by Cardelli [7]. The reaction mechanism for a single-input, single- 
output gate, a logical repeater or sequence translator, is detailed in Fig. 2. The input 
and output have the same domain architecture with a short toehold followed by a 
longer recognition domain. Each complete logic gate consists of a join and a fork 
gate. Several auxiliary species, referred to as helper strands, are also required for the 
reaction. Inputs bind to the join gate and trigger a sequence of strand displacement 
reactions that result in the release of an intermediate output strand. This intermediate 
output then reacts with the fork gate, triggering a second sequence of reactions that 
finally result in the release of the output. The two-component join-fork architecture 
and use of helper strands ensure that the input and output sequence are completely 
unrelated and have the same domain architecture. The gate architecture is modular, 
and gates with multiple inputs and outputs can easily be designed. 


3.2 Gate Design 


A key advantage of the ndsDNA gate architecture is that each gate can be derived 
from fully double-stranded DNA through enzymatic processing. While in our ear- 
lier work we used plasmid DNA as a starting material in order to minimize error 
rate, we here propose to generate the double-stranded DNA by PCR amplification 
of single-stranded DNA synthesized on an array. We will make two modifications 
to the architecture introduced in Ref. [12]. First, we will ensure that all toeholds 
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Fig. 2 DNA components (a) and reaction mechanism (b) for a one-input, one-output gate. The 
repeater gate consists of a join and fork component and a set of helper strands. Inputs and outputs 
are referred to as signal species. Here, we propose to derive join and fork gates from DNA pools 
through enzymatic processing while inputs and helpers are synthetic DNA. The gate architecture is 
similar to that described in Ref. [12] with two important exceptions: first, all toeholds are internal, 
meaning that they are flanked by double-stranded domains on both sides. This change was made 
to ensure that all toeholds have similar binding energies. Second, double-stranded domains (light 
and dark blue) appended to some of the helper strands are used for selective PCR and sequencing 
of reacted gate species as detailed in the text 
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have the same sequence and are “internal,” i.e., they are flanked by double-stranded 
domains on both sides (see the leftmost toehold in Fig. 3 for an example of an internal 
toehold). As a result, all toeholds should have very similar binding energies and all 
individual strand displacement reactions should occur at similar rates. Second, and 
more importantly, we modify the gate architecture to enable read out of gate activa- 
tion by sequencing. Specifically, we add a double-stranded domain (shown in blue 
or light blue in Fig. 2) to the final helper strand for both the join and fork gate. These 
domains do not participate in the strand displacement reactions but are appended to 
the gate complex when the reaction has completed. These domains can be detected 
by sequencing and can be used to distinguish reacted from unreacted gates. 


Scaling Up DNA Computing with Array ... 287 


A third important point is that we operate in a different regime than that described 
in Ref. [12]. Specifically, we assume that inputs are in excess of the gates such that 
the gates are either fully triggered or completely off. Thus, while these gates were 
initially designed to approximate a desired analog (chemical kinetics) behavior, we 
here treat them as digital logic gates. 


3.3 Making ndsDNA Gates from Array-Synthesized DNA 


The workflow of gate generation is detailed in Fig. 3. An initial gate strand is chem- 
ically synthesized including flanking sequences that can be used for PCR amplifi- 
cation. Using matching primers, this strand is amplified to generate double-stranded 
DNA. The resulting duplex DNA is then processed with nicking enzymes that gen- 
erate breaks in the top strand of the gate and also generate an initiating toehold. 

By using PCR and nicking enzymes to generate each join and fork gate from 
a single strand, we overcome two major issues. First, PCR amplification is only 
successful if both primer sequences are present. Therefore, the PCR reaction will 
preferentially amplify full-length molecules over a background of fractional synthesis 
products. Moreover, most multi-stranded gate architectures require careful control 
over stoichiometry, which is currently impossible to achieve with array-synthesized 
DNA. Using only a single strand to generate the entire gate overcomes this limitation. 
Moreover, all strands that are synthesized with the same primer sequences will be 
amplified together. By using a common set of primers for all gates that belong to 
the same functional module, we can amplify and then process all components of a 
circuit of interest in a one-pot reaction. 

Figure 3 shows an example of join gate processing. In this case, the initial duplex 
was generated by PCR of a single column-synthesized oligo. Enzymatic digestion 
was performed as outlined on the left of the figure. Denaturing gel electrophoresis 
was used to show that the digestion results in digestion products of the expected 
length. 


3.4 Characterizing Gate Kinetics 


Before turning to a readout by sequencing, we will characterize the behavior of 
individual gates and small circuits using traditional fluorescence-based assays. For 
an initial test, we generated an AND logic gate (both join and fork gates) starting 
from column-synthesized oligos. Upon PCR amplification and enzymatic digestion, 
we first characterized gate behavior using non-denaturing electrophoresis (Fig. 4, 
left). We then triggered the gate using either both or only one input and measured the 
rate of output release using a fluorescent reporter, following the protocol detailed in 
Ref. [12]. The kinetics data show correct AND logic behavior (Fig. 4, right). 
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Fig. 4 DNA AND logic gate. a Fork gate configuration before and after reaction. b Native gel 
assay shows the expected DNA species for the fork gate before and after reaction. c Fluorescence 
kinetics data for the complete AND gate (fork + join + helpers) confirm that signal is high only if 
both inputs are present as expected for AND logic 
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Fig.5 Reading out DNA logic gates with next-generation sequencing. a Using a sequencing-based 
readout rather than a fluorescence-based readout in principle makes it possible to read out millions 
of reactions in parallel. However, unlike in a fluorescence-based assay, only reaction end points 
can be assessed because of a lack of time resolution, making a sequencing-based readout ideal 
for Boolean logic. b After completion of the computation, ligation is used to seal the nicks in 
the gate complexes. Because only reacted gates contain the domain shown in light blue, they can 
be selectively amplified with PCR. After that, additional DNA domains necessary for sequencing 
(purple) are appended to the reacted gates using standard Illumina ligation protocol 


3.5 Reading Out DNA Computation with Next-Generation 
DNA Sequencing 


Our approach relies on high-throughput (Illumina) sequencing to detect whether a 
gate was activated or not. High-throughput sequencing is routinely used to read out 
hundreds of millions of sequences in parallel, and a key advantage of our approach 
is that the state of all gates in a reaction mix can be read out in a single experiment. 
In our work, each sequencing read will correspond to a single gate molecule and will 
provide both information about the identity of the gate and its activation status. 
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Fig. 6 Preliminary data confirm that gate operation can be read out using next-generation sequenc- 
ing. a A logic AND gate and a simplified representation of the corresponding DNA gates. b Sequenc- 
ing read counts for the join and fork gate that together make up the AND gate. c A translator gate 
and a simplified representation of the corresponding DNA gates. d Sequencing read counts for two- 
stage reaction cascades. Each one-input, one-output logic gate consists of a join and a fork gate. 
Blue bars indicate counts in the absence of inputs, and red bars show counts if inputs are present. 
The results show that two gates can be cascaded and can be read out by sequencing reacted gates 


As explained above, reacted gates can be distinguished from unreacted gates 
because an additional double-stranded domain (shown in blue in Fig. 5) is appended 
to the reacted gate. In the sequencing reaction, the entire gate sequence including the 
appended auxiliary domain will be detected. Presence (or absence) of the auxiliary 
domain indicates whether a specific gate participated in a reaction. Moreover, the 
sequence of the gate uniquely identifies the gate (i.e., the sequence shows which 
inputs are accepted and outputs released). The sequencing reaction thus effectively 
amounts to counting the number of reacted (and unreacted) gates of each type. 

To prepare gates for sequencing, we first use ligation to seal all nicks, followed 
by PCR to enrich full-length products. Next, we use ligation to add the sequenc- 
ing adaptors required for lumina sequencing. These adaptors are used to bind the 
product to the flow cell and also serve as binding sites for the sequencing primers. 

To test whether gate operation could be read out by sequencing, we again turned 
to gates derived from individual ultramers (Fig.6). We tested both a single AND 
gate and two distinct cascades of repeater gates. In all instances, we found the gates 
behaved broadly as expected, confirming that sequencing could be used to read out 
the performance of ndsDNA gates. 


3.6 Reading Pools of Array-Derived Gates 


Next, we turned to the challenge of scaling up circuit construction and analysis. 
We asked whether DNA gates could be synthesized on an array and whether gate 
operation could be read out with sequencing. To that end, we designed a set of 
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Fig. 7 Preliminary data confirm that gate operation can be read out using next-generation sequenc- 
ing in a highly multiplexed format. In this experiment, 380 join gates were tested simultaneously. 
Each join gate accepts two inputs and gates were designed to accommodate all possible combina- 
tions of 20 distinct inputs (no gates were designed that accept the same input twice). Although input 
order should not matter for the required logic operation, inputs react with join gates sequentially, 
and two different gate designs are triggered by each input combination. Each entry in the heat map 
corresponds to read counts for one join gate. a In the absence of any inputs, read counts are low. b 
When all inputs are present, read counts are high for all gates. c When only a subset of inputs are 
added to the mix, read counts are high only for the triggered gates 


380 join gates that could be triggered with all possible combination of 20 distinct 
inputs (see Fig.7). Gates that respond to two copies of the same input, i.e., inputs 
(A, A), were not synthesized but we did synthesize gates for both input pairs (A, 
B) and (B, A). Although these are logically identical, the DNA domains are ordered 
differently, potentially resulting in performance variation. In the current iteration of 
our approach, all single-stranded inputs and auxiliary strands are individually column 
synthesized. In principle, it would be possible to also obtain at least some of these 
strands through enzymatic processing. However, we here instead chose to limit the 
number of strands that need to be ordered by reusing sequences between different 
modules that are tested independently. Although we observed some variation in levels 
of triggering, all gates behaved as designed, providing support for the work proposed 
here. Next, we will extend this approach to testing full AND gates, characterizing 
gate kinetics and increase the size and complexity of the circuit. 


4 Discussion 


In this manuscript, we provided a detailed example of how high-throughput DNA 
synthesis and sequencing technologies can be used to scale up DNA computing. We 
experimentally demonstrated AND logic gates as well as testing a pool of 380 array. 

A potential challenge to scaling up circuit size is that the concentration of each 
gate in a pool is inversely proportional to the size of a pool [26]. That is, if PCR is 
used to amplify a pool with 10 gates to a total concentration of 1 yM, each individual 
gate will be at an operating concentration of 100 nM. For a circuit with 10,000 gates, 
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the operating concentration of each gate would only be 0.1 nM, likely resulting in a 
very slow computation (likely taking hours or even days) for multi-layered circuits. 
However, we believe that there are multiple avenues for overcoming this limitation. 
For example, recent work by our own group and others [10, 15, 16, 30, 39] showed 
that spatial colocalization of circuit elements using DNA origami scaffolds can dra- 
matically accelerate DNA circuit computation. Similar colocalization approaches 
based on DNA nanostructures, beads, or other scaffolds could also be developed for 
atray-derived logic gates. 

Another potential drawback of our approach is that it only provides end point data 
rather than full reaction kinetics. While this is not a concern in the context of Boolean 
logic circuits for which we only care whether an output is true or false once the 
computation has completed, it could prove limiting for other applications. However, 
we believe that approaches based on nanopore sequencing which can provide real- 
time readout of reaction progress can overcome this limitation. 

Although this work is still preliminary, we were able to demonstrate parallel 
operation and readout of 380 Join gates in a single reaction. We believe that this 
work will provide a foundation for large-scale DNA computing with applications in 
disease diagnostics and DNA data storage. 
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Abstract With recent high-throughput technology, we can synthesize large hetero- 
geneous collections of DNA structures and also read them all out precisely in a single 
procedure. Can we use these tools, not only to do things faster, but also to devise new 
techniques and algorithms? In this paper, we examine some DNA algorithms that 
assume high-throughput synthesis and sequencing. We record the order in which N 
events occur, using N 2 redundant detectors but only N distinct DNA domains, and 
(after sequencing) reconstruct the order by transitive reduction. 


1 Introduction 


With recent high-throughput technology, we can synthesize large heterogeneous 
collections of DNA structures [7, 13] and also read them all out precisely in a single 
procedure [10]. This contrasts with the older practice of assembling structures one 
at a time and of reading them out individually (e.g., by fluorescence), or reading 
them together ambiguously (e.g., by gel electrophoresis). Can we take advantage of 
these high-throughput and high-precision technologies, not only to do things faster 
but also to devise new techniques and algorithms? In this paper, we examine some 
DNA algorithms that assume both high-throughput synthesis and high-throughput 
sequencing: they would not be very practical otherwise. 

A sequence ‘s’ of DNA nucleotides hybridizes (forms a double strand) with its 
reverse Watson-Crick complement denoted ‘s*’; we write the resulting double strand 
as ‘s’. Subsequences of ‘s’ are called domains provided they are independent of each 
other, that is, provided that differently identified domains do not hybridize with each 
other, or with significantly long parts of each other [16]. Under normal laboratory 
conditions, a domain ‘a’ is called short if it hybridizes reversibly with ‘a*’, and long 
if it hybridizes irreversibly with it. 

A short single-stranded domain ‘t’, called a toehold, followed in the same sequence 
by a long single-stranded domain ‘a’ can initiate strand displacement. This is the 
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process (later detailed in Fig. 3) where a single-stranded sequence ‘ta’ hybridizes to 
a double strand composed of ‘t*’ attached to the bottom strand of a double-stranded 
‘a’. The invading ‘ta’ can displace and possibly replace the existing ‘a’ domain of 
the double strand through a random-walk competition between the two ‘a’ domains 
hybridizing to the same ‘a*’. 

A nick is an interruption in one of the two strands of a double strand, at the 
boundary between two domains. By cascading short and long domains, occasionally 
separated by nicks, we can achieve multi-step strand displacements where the whole 
sequence of displacements can itself be reversible or irreversible. This way we can 
emulate reversible and irreversible chemical reactions [14] and other computational 
abstractions [9]. 

The readout of the outcome of such sequences of displacements is often done 
by fluorescence. Fluorophore/quencher pairs are attached to some domains that par- 
ticipate in the reactions, those in particular whose displacement indicates that a 
significant event has occurred. The displacement separates the fluorophore from the 
quencher and hence induces visible fluorescence. This provides a real-time account 
of the computation, but the readout capability is restricted: a limited number of sepa- 
rate events can be detected by using different fluorescence colors. This is analogous 
to debugging a program by inserting a limited number of print statements at a time, 
each one printing a single letter. 

Another way of achieving a readout is via gel electrophoresis, to distinguish the 
sequences in a solution by their length at the end of the experiment (or at prede- 
termined time points). Many different sequences can be identified provided they 
have different lengths (masses) and provided that we know their length ahead of 
time. Unexpected lengths can be hard to identify. This is analogous to debugging 
a program by using control flow counters to tell us how many times each routine 
is invoked, or each structure is accessed, without any insight about the order of 
events. 

Finally, and especially with more recent high-throughput technology, we can 
obtain a readout by ligating the nicks and sequencing all the strands in the solution 
at the end of the experiment (or at predetermined time points). With high-throughput 
sequencing, we can inspect potentially the entire composition of the solution. The 
debugging analogy now is that of taking a core dump: analyzing in complete detail the 
entire state of a computation, but only infrequently or at the end, and again without 
any obvious insight on the order of events that occurred. 

The order of events is usually of great interest: for example, multiple laborious 
gene knockout experiments are frequently carried out to determine the order of gene 
activations. What if we could instead take a single core dump that tells us the order 
of all the events of interest? To that end, we should record the order of events within 
the state of the system, so that we can inspect such recording at the end as part of the 
core dump. Assuming high-throughput sequencing, we can embed a large amount of 
information within the solution. We are going to assume that we can embed N? pieces 
of information, where N is the number of events of interest. This seems achievable 
for reasonably small N while providing a lot of information, encoding for each event 


Sequenceable Event Recorders 297 


whether it happened before, together, or after any other event. Each one of the N? 
event order detectors is a structure that accepts inputs but does not produce outputs: 
when it detects certain conditions, it locks down in a stable state and waits to be 
sequenced later. 

Our strategy is therefore to embed a preorder (a reflexive and transitive relation) 
of events within the solution. This is a pre-order because we may not be able to detect 
the precise order of two events if they happen very close to each other, in which case 
both directions are recorded. With N? detectors, we can determine the order of any 
pair of events without needing to coordinate the detectors with each other or with a 
central structure; hence, each detector can be relatively simple. An alternative is to 
use only N detectors that sequentially add records to a central tape, but this requires 
a way of guaranteeing atomic access to the tape [9]. Still, event recorders of the tape 
variety, readable by sequencing, have been nicely demonstrated using natural DNA 
and protein mechanisms [12, 15]. 

A preorder is not the entire history of a computation. We are considering the 
preorder of first-occurrence of events: any subsequent occurrences for the same signal 
are not recorded. This limited information can still provide support for causality: if 
an event always precedes another event over a number of runs, then this supports the 
first event causing the second, or having a common cause. 

In the rest of this paper, we aim to describe the architecture of such a preorder 
recorder, using DNA strand displacement technology, slowly building up from sim- 
pler problems. A property of all the designs in this paper is that (apart from the 
single-stranded input signals) all DNA structures are nicked double strands with no 
additional modifications or secondary structure. Therefore, the required and poten- 
tially large numbers of components can be fabricated by bacterial cloning as a single 
or a few long DNA double strands, followed by enzymatic cutting and nicking [4] 
(see Appendix). Other technologies for high-throughput synthesis of large hetero- 
geneous libraries exist [7, 13]. Thus, we rely on both high-throughput synthesis for 
producing the N? detectors and on high-throughput sequencing to read them out. 


2 Occurrence Recorder 


We begin by investigating the simplest event recorder: recording the occurrence of 
events at any time during an experiment. By an ‘event’ here, we mean the appear- 
ance of a whole population of identical molecules and in fact a specific structure of 
molecules that can be uniformly identified. Any event that does not fit that description 
must first be transduced into one of these uniform molecular structures. By a signal, 
we mean a population of one such molecular species over time, and by an event, we 
mean the appearance of a signal population (we do not detect the disappearance of 
a population). 

In discussions, we summarize DNA structures by a textual notation. In addition 
to lowercase letters like ‘a’ for single-stranded long domains, and underlined letters 
like ‘a’ for the corresponding double-stranded long domains, a single short domain 
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a a a 
+-+----- > +----- +-> +----- +-> 
a q a q a q 
ct tS =sSe> >F <=> Ses cant > poses >F => Ss a eas get eet 
<-+-----+-+-----+Q Koteecostedto-==4Q Rofesssst-#=-==-4Q 
q q q 
aai eaaa > pepanas > +----- >F 


Fig. 1 Yes gate for detection by fluorescence 


is used for all toeholds: ‘~’ is an open (i.e., un-hybridized) toehold on a single strand 
or on the upper strand of a double strand, ‘_’ is an open toehold on the lower strand 
of a double strand, and ‘~ is a covered (double-stranded) toehold. A sequence of 
domains on a double strand with an initial open toehold and an intermediate covered 
toehold looks like ‘_a—b’. This summary notation omits information about nicks, 
which are instead detailed in corresponding figures. Note that, before sequencing, 
all the open domains should be complemented, and all the nicks should be ligated. 

Figures instead depict the corresponding single and double strands graphically 
(e.g., Fig. 1). A domain is a short or long sequence of dashes ‘-’ with domain delim- 
iters ‘ >’ and ‘<’ pointing in the 5’-to-3’ direction to indicate either a nick (an 
interruption in the strand) or the 3’ end, and ‘+’ to indicate the 5’ end or the logical 
boundary of a domain (not a nick). The name of a domain is a lower-case letter 
placed on top of the upper strand, with implicitly the reverse complement domain 
on the lower strand. All toeholds are the same sequence: they have a blank name. 
Reversible reactions are ‘<=>’ and irreversible reactions are ‘=>’. 


2.1 Yes Gate 


The events that we want to detect are represented by single-strands ‘—a’ each con- 
sisting of a (short) toehold ‘~’ attached to a (long) domain ‘a’. If ‘—a’ is ever present, 
we want to know about it: this is the purpose of the Yes gate for ‘a’. 

First (Fig. 1) let us consider the traditional way of detecting ‘“—a’. A double- 
stranded structure < a—q’ with an open toehold ‘_’ accepts the single-strand ‘—a’ 
(reversibly) and opens up another toehold, yielding ‘—a q’. That structure then locks 
down (irreversibly) by combining with an auxiliary single-strand ‘-q’ to produce the 
fully hybridized ‘—a—q’ and the toehold-free ‘q’. 

If we attach a fluorophore (F) and quencher (Q) pair at the right end of * a—q’ (and 
not at the end of ‘“-q’), we can detect the occurrence of ‘—a’ because it separates F from 
Q and causes visible fluorescence. However, if we were to (ligate and) sequence the 
solution, it would be difficult or impossible to tell the difference between the initial 
and final state, because they differ only by open toeholds and by the positions of 
nicks that are erased by ligation. 
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a a a 
+-+----- > +----- +-> +----- +-> 
a q a q a q ba 
+----- +->----- > <=> +-+----- > +----- > => +-+----->-+-----+----> 
<-+-----+-+-----+ <-+-----+-+-----+ <-+-----+-+-----<----+ 
q r q r q 
+-+----- +----- > +-+----- +----- > +----- > 
<----- <----- + <----- <----- + <----- + 


Fig. 2 Yes gate for detection by sequencing 


=== ga<-<- + 
Tsy tee fo). far == > 
a a q a q q r 
Jass leo Jore ee 
<-+-----+-+-----+ <-+-----+-+-----+ 


Fig. 3 3-way and 4-way displacements in the first and last steps of Fig. 2 


Let us now consider (Fig.2) an additional domain ‘r’ that will help us tag the 
desired outcome. The ‘—q’ single-strand is replaced by a ‘qr’ double strand, but 
with a nick on the bottom between ‘q’ and ‘r’.! The first reaction is the same as 
before, but the second reaction is now a 4-way strand displacement’ (Fig. 3 right). 
This detector is non-catalytic: it captures some of the ‘—a’ strands and releases ‘a~’ 
strands (which are usually harmless). 

If this gate is triggered, then the main outcome is ‘—a—qr’, which is a nicked but 
fully complemented double strand: it is ready for ligation and sequencing. If the gate 
is not triggered, then the outcome is the initial ‘ a—q’ which is distinguishable after 
sequencing.* 

For a catalytic version (one that does not sequester the input), consider the design 
in Fig. 4: we add two more structures to Fig. 2 that absorb the ‘a~’ that was left over 
and convert it back to a free ‘—a’. Such catalytic irreversible gates avoid sequestering 


' The ‘q’ domain can be single-stranded, interacting by a simpler 3-way displacement, but that would 
rule out producing the structure directly by cloning [4]. If the whole ‘-qr’ were single stranded, 
a polymerase could not attach to the final structure to complement the ‘r’ domain, as required for 
sequencing and positive detection, but a PCR step could be used instead. 


2 4-way strand displacement is slower than 3-way [8] Chap. 5 (although potentially more robust [6]). 
This may degrade the ability of our algorithms to separate events in time, but otherwise it does not 
affect their logic, which includes the possibility of coincidence of events. The 4-way displacement 


is of the unusual ‘open’ kind [8], that is, initiated by a single toehold binding instead of two. 


3 < a-q’ needs to be fully complemented, ligated, and sequenced. To that end, we can add an addi- 


tional double-stranded domain on the left of the initial toehold, as in [5]. This allows a polymerase 
to proceed in the 3’—5’ direction of the bottom strand and fully complement the top strand. For this 
presentation, we omit these domains because they have no other function and do not participate in 
the described reactions. Moreover, even if sequencing misread the initial state, we would still get 
our answer by the presence or absence of the final state ‘-a-qr’ + ‘q’. 
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a a a 
+----- +-> +-+----- > +-+----- > 
ü a u a u a 
+----- >-+----- > <=> +----- > +----- +-> => +-----+->-----+-> 
<-----+-+-----+-+ <-----+-+-----+-+ <-----+-+-----+-+ 
u u u 
+----- +-> +----- +-> +----- > 
<----- + <----- + <----- + 


Fig. 4 Catalytic Yes gate, additional reactions 


weak signals, while being fully activated by weak signals, leading to robust detection 
(if the signals are not drained too quickly by downstream processing). 


2.2 Occurrence Recorder Algorithm 


We can use Yes gates to detect a collection of signals in an experiment via a single 
high-throughput readout: we prepare a Yes gate detector for each signal, we mix 
them in at the beginning, and we sequence the entire solution at the end, revealing 
any detectors that have fired. 


3 Coincidence Recorder 


We now move to a more interesting task: detecting the simultaneous presence of 
signals. The idea enabling the sequencing-based readout of gates, and in particular 
the novel use of 4-way displacement, is due to Chen and Seelig [5] (the Yes gate of 
Fig. 2 is also a special case of this). Their design was originally meant as an AND 
gate made of a sequenceable Join part accepting inputs, and a sequenceable Fork 
part producing outputs. We are going to use just a sequenceable Join half to detect 
the simultaneous occurrence of any pair of signals in a given set of signals, relying 
on high-throughput sequencing to inspect all possible combinations. 


3.1 Join Gate 


The design in Fig. 5 is rooted in a fluorophore-oriented Join gate, along the lines of 
Fig. 1, which ultimately comes from [11] and [2]. However, again here we want to find 
a sequencing-friendly version, where the initial structures “ a—b—q’ and “—qr’ with 
input signals ‘~a’ and ‘~b’ are sequencing-distinguishable from the final structure 
‘-a—b-qr’, which indicates that both signals were present at the same time. The gate 
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a a 
+-+----- > b +----- +-> b 
+-+----- > +-+----- > 
a b q a b q 
+----- +->----- +->----- > <=> +-+----- > +----- +->----- > <=> 
<-+-----+-+-----+-+-----+ <-+-----+-+-----+-+-----+ 
q r q r 
+-+----- +----- > +-+----- +----- > 
<----- <----- + <----- <----- + 
a a 
or +-> b ; +> b 
+----- +-> +----- +-> 
a p q a b q r 
+-+----->-+-----> +----- > => +-+----->-+----->-+-----+-----> 
SoS a Sa a ee See a ee a ae 
q r q 
reo = a ae _=S== = 
fleets qarsa Taarna Gu 


Fig. 5 Join gate for detection by sequencing 


locks down when the two signals are received in turn. If one signal appears first and 
persists until the second arrives, this gives the same result as both signal appearing 
together. If one signal is removed before the other one appears, the gate reverts 
and the result indicates no co-occurrence. The gate can be made more kinetically 
symmetrical by mixing Join(a,b) with Join(b,a). 

As in the previous case, we can add structures to this gate that convert it to 
a catalytic gate. But we need to handle the two signals together in the additional 
structures, because ‘a—’ must be able to revert to ‘—a’ when ‘—b’ is not present. 
Hence, we use the binary structure in Fig. 6 for distinct a,b. This structure cannot 
coexist with a catalytic Yes(a) as it would lock down Join(a,b) on the first input: 
a later non-coincident ‘—b’ would give a false positive. Join(a,a) must not have the 
additional catalytic structures for the same reason: it is best to replace it with a 
non-catalytic Yes(a). 


3.2 Coincidence Recorder Algorithm 


We can use Join gates to detect the simultaneous occurrence of any pair of distinct 
signals in a collection: we prepare a Join gate detector for each such pair, we mix 
them in at the beginning, and we sequence the entire solution at the end, revealing 
any detectors that have fired. If we detect Join(a,b) and Join(b,c), we can deduce the 
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a a 
b +----- +-> b +-+----- > 
a A +-> C i Ai > 

v b a v b a 
+----- >-+----- >-+----- > <=> ne => +-----+->-----+->-----+-> 
ie a A S, A S a a ee A 

v v 
+----- +-> +----- > 
<----- + <----- + 


Fig. 6 Catalytic Join gate, additional reactions 


coincidence of a and c, and we should also detect Join(a,c): that redundancy serves as 
a crosscheck. We could use fewer Join gates, but if we did not include the transitive 
Join(a,c) and b never came, we would not detect the coincidence of a and c. 


4 Preorder Recorder 


We now aim to build a device to record the order of occurrence of events in an exper- 
iment. The question is: given a set of events a, b, c, d, ... that occur in some order, 
in what order did they first occur? If some events can occur together (up to experi- 
mental uncertainty), the relationship is a preorder: a reflexive and transitive relation. 
We want to reconstruct the temporal preorder of events from a single observation at 
the end of a run, with a single mass sequencing. 

Such a preorder recorder would be useful for monitoring a process over time 
without sampling the system at multiple time points. Our recorder does not record 
timing and does not record sequence, but it records the first-occurrence preorder, 
storing it within the system itself. Recording the order, rather than the full timing of 
events, means that we need not use energy during periods of inactivity, and we need 
not worry about how often we should sample the system. The energy expenditure is 
all preloaded: no additional resources are needed no matter how long or complex the 
events history becomes, and there can be no ‘memory overflow’ of the recording. 
Repeated preorder experiments can build up evidence for causality, by observing 
which events always happen in the same order, independently of timing and other 
conditions. 

The algorithm below uses a number of gates that is quadratic in the number 
of signals N but is independent of the observation time. After the initial setup, it 
requires no further energy because it reacts to signals and does not actively inspect 
the environment for their presence. More subtly, the algorithm uses a number of 
distinct domains that is just N + 4 (+1 for toeholds). This is important to avoid 
crosstalk among domains, which becomes more difficult to avoid when we have more 
domains. The situation would be much worse if we needed N? distinct domains in 
addition to N? gates. 
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The coincidence recorder in the previous section was obtained by iterating a Join 
gate. For the preorder recorder, we iterate a choice gate, which we describe next. 
Instead of presenting directly the domain structures (for which there are multiple 
possibilities), we first describe abstractly how the choice gate behaves, and how the 
algorithm uses it. The DNA implementation is described later. 


4.1 Choice Gate Specification 


A choice gate is a two-input gate denoted a?b between input events a and b. As an 
abstract operator it is symmetric: a?b = b?a. Its desired behavior is as follows: 


e If a arrives no later than b, then a?b produces a distinct result that we indicate 
a < bor equivalently b > a. 

e If b arrives no later than a, then a?b produces a distinct result that we indicate 
b < a or equivalently a > b. 

e If a and b arrive together, then a?b produces a result that we indicate a ~ b or 
equivalently b ~ a. (This is in practice an equal mixture of a < b and b < a, or 
an unequal mixture if they arrive slightly offset.) 

e As a special case, if a ever arrives, then a?a produces a result a ~ a. 


The three results between different a and b are assumed to be distinct and distin- 
guishable by sequencing. Our algorithm requires only that there are three detectable 
final configurations: a < b and b < a depending on which of two inputs arrives first, 
and a mixture of the two, a ~ b, if they arrive together. We may further analyze the 
results quantitatively: a 100%/0% mixture of a < b and b < a indicates that enough 
of a arrived to exhaust the gate population before any of b (if any) arrived. Other 
mixtures may indicate how much events overlapped in time, their relative strength, 
or some confusion between those. Weak signals may appear to have arrived together. 

There are many ways to achieve this specification, and we will discuss at least 
two. But first we describe the algorithm that uses these gates. 


4.2 Preorder Recorder Algorithm 


Suppose we have a (moderately large) set of events a, b, c, d, e, f, ... , like the 
occurrence of some mRNAs in a cell-free extract. They will activate in some order 
like b.cd.ae.d (b first, then cd together, then ae together, then d). We want to store 
that order as the events arise, and read it back at the end. 

For N signals, we need N? distinct DNA structures: all the possible combinations 
of two signals, including all the x?x cases. We are not going to distinguish event 
sequences with repetitions and oscillations: we only look at the first occurrence of a 
signal. For example, the sequence b.b.b is the same as b for us, and a.b.a is the same 
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as a.b (we can still tell that the first a arrived before the first b: the second a does not 
confound it). 

We do not provide any external timing: there are no clocks needed to sample these 
signals over time, and there is no predetermined sampling frequency. We just need to 
assume that the sequence of events is slow enough. If it is not slow enough then a.b 
will look just like ab (in practice, the closeness of two signals will be reflected in the 
relative proportions of a< b and b< a, so we can still get some more information). The 
time resolution is thus determined by the speed of the DNA reactions. If they happen 
to be fast enough for the intended observed system, then sampling over longer time 
periods does not require any more gates or any more energy: the gates just naturally 
sit waiting for the signals to arrive. 

The input to our algorithm is a preorder of signals, like a.bc.def. g that is occurring 
in real time in our experiment. We initially add to the solution all the choice gates 
x?y such that x and y range over all those signals (including x = y). At the end, 
we sequence all the leftover structures (e.g., x < y) and we reconstruct the preorder 
from them. The process of reconstructing the preorder graph from what is essentially 
its reachability matrix is called transitive reduction and has the same complexity as 
transitive closure and matrix multiplication [1]. 


4.3 Crosstalking Choice Gate 


We now describe a DNA implementation of the choice gate a?b. We discuss below 
how the gates crosstalk, and what are the consequences of crosstalking. But in sum- 
mary, for our application, this implementation is sufficient, and it is also considerably 
more economical than a ‘proper’ non-crosstalking implementation. 

The inputs are the usual two-domain signals with toehold on the left. For each 
abstract choice operator a?b, we use two pairs of double strands abbreviated as group 
[a?b| and group |b?a], with a?b = [a?b| + |b?a]. They are symmetric but different 
because [a?b| reacts to a “-b’ strand, while |b?a] reacts to an “—a’ strand. Conversely, 
[a?b| reacts also to an ‘a~’ strand and |b?a] reacts also to a ‘b—’ strand, through the 
same toehold but in opposite directions. 

In Fig.7, each of the primary structures (top) eventually binds to one and only 
one of the two end caps (bottom): we arbitrarily associate one end cap with [a?b| 
and the other with |b?a] (the square bracket indicates the side the end cap is with), 
so in fact a?b = [a?b| + |b?a] = [b?a| + |a?b] = b?a. The central portions with the 
‘a’ and ‘b’ domains are surrounded by four fixed domains ‘s’, ‘p’, ‘q’, ‘r’: these are 
the same sequences for all the choice gates, regardless of variations in ‘a’ and ‘b’.4 
The nameless toehold is the same sequence everywhere. 


> «as 


4 In fact, all four ‘s’, ‘p’, ‘q’, ‘t? domains can be the same sequence without ambiguity in the 
outcomes of Fig. 8. Still, we keep them distinct in light of other possible constraints, such as in 
Fig. 10. 
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[a?b| |b?a] 
p a b q p b a q 
+----- >-+----- > +----- +->----- > +----- >-+----- > +----- +->----- > 
<-----+-+-----+-+-----+-+-----+ <-----+-+-----+-+-----+-+-----+ 
S p q E 
+----- +----- +-> +-+----- +----- > 
<----- <----- + <----- <----- + 
Fig. 7 Crosstalking choice gate 
a 2b b<a 
p a b q f s p b a q 
+----->-+----->-+----->-+-----+-----> +-----+-----+->-----+->-----+->-----> 
<-----+-+-----+-+-----+-+-----<-----+ <-----<-----+-+-----+-+-----+-+-----+ 
q p 
+----- > +----- > 
<----- + <----- + 
bea a<b 
p b a q f s p a b q 
+----->-+----->-+----->-+-----+-----> +-----+-----+->-----+->-----+->-----> 
<-----+-+-----+-+-----+-+-----<-----+ <-----<-----+-+-----+-+-----+-+-----+ 
q p 
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<----- + <----- + 


Fig. 8 Crosstalking choice gate outcomes for a?b. Top: for input ‘-b’ (red), which also releases 
back ‘-b’ (teal, not shown). Bottom: for input ‘—a’ (red), which also releases back ‘“—a’ (blue, not 
shown) 


If a signal ‘—b’° (with toehold on the left) binds to [a?b|, it blocks the toehold and 
displaces to the right. It also releases “b—’ (with toehold to the right), which goes to 
|b?a], again blocks the toehold there, and displaces to the left, catalytically releasing 
a copy of the original ‘“—b’. The end caps can bind to the remaining open toeholds 
and lock down the configuration. If ‘“—a’ arrives later, it finds all the toeholds blocked 
and cannot bind to the remaining structures, Thus ‘—b’ arriving first prevents ‘—a’ 
from binding later. If ‘—a’ arrives first, the situation is symmetric, with the end caps 
binding to the opposite structures than in the ‘“—b’-first case. 

In more detail, the initial binding of signals opens up new toeholds for the double- 
stranded ‘sp—’,‘—qr’ end caps: they cause 4-way strand displacements and stabilize 
the outcomes in a way that is distinguishable by sequencing. For a ‘~b’ input the 
final structures are ‘p—a—b-qr’ + ‘q’, which is the result we earlier called a > b, 
and ‘sp-b-a-q’ + ‘p’, which is the result we earlier called b < a (Fig.8, top). The 
opposite happens if ‘~a’ arrives first (Fig. 8, bottom). If ‘-a’ and ‘~b’ arrive together, 
then both results are produced because the released ‘a—’ and ‘b~’ bind concurrently 
to as yet untouched copies of the gates. 

These activations are irreversible and catalytic: “—a’ and ‘—b’ are released back 
without requiring additional structures. This is going to help kinetically and is also 
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bxa axb 
+-+----- > 4+-+----- > 
<----- + <----- + 
p axb b bxa q p bxa a axb q 
+----- >-+----- > +----- +->----- +->----- > +----- >-+----- > +----- +->----- +->----- > 
<-----+-+-----+-+-----+-+-----+-+-----+ <-----+-+-----+-+-----+-+-----+-+-----+ 
s p q © 
+----- +----- +-> +-+----- +----- > 
<----- <----- + <----- <----- + 


Fig. 9 Non-crosstalking choice gate 


less likely to perturb the system we are observing. Reflexive gates a?a work as 
expected: we need them to signify that a signal ‘—a’ has arrived at some time. We 
produce the a?a structures by the general recipe, meaning as [a?a| + |a?a], hence 
with twice the concentration of the main structure. This is in fact what we need to 
keep the kinetics balanced with respect to non-reflexive gates. 

A single choice gate works as described, but we need to consider the situation 
where there are multiple choice gates together. In a gate with [a?b|, the input ‘—b’ 
releases ‘b~’, which goes on to bind to |b?a], but also to any other |b?x]: crosstalk! 
Normally this would be incorrect, but here we want to activate |b?x] as well, since 
it tells us that “-b’ arrived before ‘—x’. If there is a |b?x], then there is also an [x?b], 
which driven by ‘—b’ activates |b?x] anyway. So the crosstalk between gates does not 
hurt in this particular instance. The most interesting consequence is that, as we noted, 
although we have N? gates, we only have to encode N distinct domains (plus the 
4 auxiliary ones). This greatly reduces the potential interference between domains 
that would be an obstacle to scaling up the number of signals. As an added benefit, 
these crosstalking gates are automatically catalytic (cf. Fig. 9). 

As an example, for 3 signals a, b, c, we use the following 9 choice gates (first 
column) and corresponding initial structures (second column): 


gates structures after “-c’ after “-b’ 
a?a [a?a| |a?a] [a?a| |a?a] [a?a| |a?a] 
b?b [b?b| |b?b] [b?b| |b?b] b>bb<b 
c?c [c?c| |c?c] c>cc<c c>cc<c 
a?b [a?b| |b?a] [a?b| |b?a] a>bb<a 
ac [a?c| |c?a] a>cc<a a>cc<a 
b?c [b?c| |c?b] b>cc<b b>cc<b 


If a signal ‘—c’ arrives, it initially activates 3 structures, the ones of the form 
[x?c|, producing outcomes x > c. Soon after, the signal ‘c~’ that is released by those 
activations crosstalks with the structures of the form |c?y], producing outcomes 
c < y (third column). If a signal “b’ arrives next, it further activates some gates, 
but not the ones that have been used up by ‘-c’ (fourth column). If we sequence the 
structures at this point, we can conclude (with multiple redundancies) that: 
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c<b<a 


That’s a definite c < b, because we observe c < b but not b < c. Moreover, we 
do not observe a < a which means that a never arrived. If we were to observe c < b 
and b < c, then we would deduce that c, b arrived together, up to our time resolution. 

Detection of the preorder should be robust because of the redundancies. Back- 
ground noise and bad gates can be tolerated, because we just need to detect which of 
a < b vs. b < a is strongest. Moreover, our set of observed structures must be tran- 
sitively closed: if the input is the sequence a.b.c then we should observe a < b (and 
not b < a) and b < c (and not c < b), and transitively also a < c (and not c < a). 
The transitive closures can act as consistency checks. 


4.4 A “Proper” Choice Gate 


If we want to use a choice gate in some general and modular way within some bigger 
design, then we need a gate that respects all the conventions, and in particular that 
does not crosstalk with unrelated gates. In the design in Fig.9 the domains called 
‘axb’ and ‘bxa’ are uniquely determined by ‘a’ and ‘b’ to avoid crosstalk with other 
gates. Here, a ‘b~’ input does not release a ‘“—b’ signal that connects with other gates, 
but rather a ‘—bxa’ signal that binds uniquely to the other half of that choice gate. 
In our preorder recorder application, where we use N? gates, we would now need 
N + N? distinct signal domains. Other than that, this choice gate could replace the 
crosstalking one. A catalytic version can be obtained as in Fig. 4. 


5 Conclusions 


We have described a class of DNA algorithms designed to take advantage of high- 
throughput sequencing and also relying on high-throughput synthesis. A combina- 
torial number of different structures are activated on demand without any timing or 
synchronization, operating by natural parallelism. The outcome is produced not as 
an output but as the final state of the system to be read by sequencing. 


Acknowledgements Thanks to Matthew Lakin, Georg Seelig, and David Soloveichik, for helpful 
comments, and to Yuan-Jyue Chen and Georg Seelig for initial discussions that lead to this paper. 
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Appendix: Restriction Enzymes 


Naturally occurring restriction enzymes bind to double-stranded DNA and make a 
double-strand cut. Some natural enzymes and some engineered versions of restriction 
enzymes cut only one of the strands: these are called nicking enzymes [3]. Although 
there are many such enzymes used for cutting natural and synthetic DNA, their 
properties are severely restricted.° In nature, there is also a Cas9 protein that is 
essentially a programmable restriction enzyme: it can cut DNA at almost any desired 
location determined by a separate RNA strand. 

Let us imagine that we are able to design our own restriction enzymes. We 
show that it is then possible to cut and nick the crosstalking choice gates in Fig.7 
out of a longer DNA double strand. Such a strand is ideally obtained by bacte- 
rial cloning, which can produce large quantities of very long high-quality strands, 
enabling the mass production of DNA gates by cutting them out of long cloned 
strands [4]. 

There are many alternatives to the hypothetical choices of restriction enzymes 
show below, depending on where the DNA cuts are located with respect to the 
binding sites of an enzyme. But let’s assume the following possibilities: 


T 


e ‘B..# is a blunt-cutting enzyme that binds to a recognition sequence ‘B’ and 
makes a blunt double cut (i.e., at the same location on both strands) indicated by 
#, at some non-critical distance toward 5’. 

e ‘^. D’ is a nicking enzyme that binds to a recognition sequence ‘L’, and makes a 
nick indicated by ^ toward 3’, on the opposite strand, at a distance corresponding 
to a toehold length. 

e ‘R..~’ is anicking enzyme that binds to a recognition sequence ‘R’, and makes a 
nick indicated by ^ toward 5’, on the opposite strand, at a distance corresponding 
to a toehold length. 


The embedding of the ‘B’ binding sequence is straightforward because it can be 
placed outside the gates, in the surrounding DNA. The main gate structures however 
have several internal nicks; hence, the enzyme binding sites must be placed inside 
the signal domains (we cannot assume they can cut precisely at a very long distance 
from their binding site). Since these domains occur twice with different surrounding 
nicks, the placement of the binding sequences is non-trivial. However, the follow- 
ing scheme is adequate: each domain used for encoding signals has ‘L’ and ‘R’ 
enzyme binding sequences embedded as in Fig. 10 top left: they produce nicks at a 
toehold length just outside of the domain. For the staggered cutting of the end caps, 
though, it is non obvious that ‘L’ and ‘R’ would work together to produce a staggered 
double strand cut as indicated. Alternatively, two separate staggered-double-strand- 
cut enzymes need to be used there, with the stagger being the length of a toehold. 


This scheme came from a discussion with Yuan-Jyue Chen, after he pointed out 
that the placement of restriction binding sequences was problematic. 


5 https://www.aatbio.com/data-sets/restriction-enzymes-cut-sites-reference- table. 
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Fig. 10 Restriction enzyme scheme for the crosstalking choice gate. Enzymes are shown below the 


DNA structures, aligned to their binding sequences, with cutting points indicated by 


eas 


(opposite 


strand nick) and ‘#’ (blunt double cut). Top left: common pattern for all signal domains, with the 
bottom strand pointing left. Top middle and right: patterns for the end caps in context. Bottom: 
patterns for the main structures in context; note how the top left pattern leads to opening up the 
central toehold. Note also that ‘L’ on a left-pointing strand cuts to the left, but on a right-pointing 
strand cuts to the right; similarly for ‘R’ and ‘B’ 
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Computational Design of Nucleic Acid N) 
Circuits: Past, Present, and Future geti 


Matthew R. Lakin, Carlo Spaccasassi, and Andrew Phillips 


Abstract Over the past 40 years, significant progress has been made on the design 
and implementation of nucleic acid circuits, which represent the computational core 
of dynamic DNA nanotechnology. This progress has been enabled primarily by 
substantial advances in experimental techniques, but also by parallel advances in 
computational methods for nucleic acid circuit design. In this perspective, we look 
back at the evolution of these computational design methods through the lens of the 
Visual DSD system, which has been developed over the past decade for the design and 
analysis of nucleic acid circuits. We trace the evolution of Visual DSD over time in 
relation to computational design methods more broadly, and outline how these com- 
putational design methods have tried to keep pace with rapid progress in experimental 
techniques. Along the way, we summarize the key theoretical concepts from com- 
puter science and mathematics that underpin these design methods, weaving them 
together using a common running example of a simple Join circuit. On the occasion 
of the 40th anniversary of DNA nanotechnology, we also offer some thoughts on 
possible future directions for the computational design of nucleic acid circuits and 
how this may influence, and be influenced by, experimental developments. 


1 Past 


1.1 Visual DSD Origins 


The first paper on what would subsequently become the Visual DSD system was 
published in 2009 and presented a simple programming language for nucleic acid 
circuit design [1]. The language was developed by observing the current state-of-the- 
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art in nucleic acid circuits, which at the time relied primarily on toehold-mediated 
DNA strand displacement [2, 3], a powerful technique for implementing enzyme- 
free molecular computation programmed by sequence-specific DNA hybridization. 
This involves an invading single strand of DNA displacing an incumbent strand 
hybridized to a template strand, and is mediated by a short, single stranded region 
of DNA referred to as a toehold. The Visual DSD system was implemented using 
the functional programming language F# [4, 5], which facilitated the translation of 
its theoretical underpinnings to program code, and was released soon after as a web 
application [6], which simplified user adoption. DNA strand displacement has since 
been used to implement a broad range of computational circuits in DNA including 
digital logic [7], artificial neural networks [8, 9], and distributed algorithms [10], 
among many others [2, 3]. 

A key aspect of the origins and subsequent development of the Visual DSD sys- 
tem was the application of fundamental concepts from computer science in general, 
and programming language theory in particular, to formally represent corresponding 
concepts from dynamic DNA nanotechnology. We contend that dynamic DNA nan- 
otechnology is therefore an embodiment of computer science in the truest sense, and 
illustrate this via examples of computer science methods that underpin the design 
and analysis of dynamic DNA nanotechnology systems. 


1.1.1 Formal Syntax and Operational Semantics 


Our work on the Visual DSD language was inspired by previous work on the use of 
process calculi to model biological systems. At the time it was already recognized that 
theoretical approaches originally developed to model concurrent computer systems 
could also be applied to model biological systems, given their inherent parallelism. A 
promising approach was the use of process calculi such as the pi-calculus [11], orig- 
inally developed to model mobile computing systems such as telecommunications 
networks. Pi-calculus processes can create fresh names, send and receive them over 
channels, and spawn new processes. The first work on biological modeling using 
the stochastic pi-calculus [12] was carried out by Regev et al. [13] and subsequent 
work developed an operational semantics for a stochastic pi-calculus programming 
language and its corresponding implementation [14, 15]. Operational semantics for- 
mally defines the meaning of programs by specifying what happens when programs 
are executed, typically using reduction rules that determine a set of transitions from 
one program to another. The set of valid programs is defined by a formal syntax. 

In the case of the pi-calculus, a simplified syntax can be defined as follows, where 
the options on the right, separated by short vertical bars, represent valid instances of 
the syntactic category on the left: 


P:: = x.P ı (Pi | Po) i vxP 
i: = X(y) 1 x(z) 
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The definition states that a process P can be an action 2.P, which runs the prefix 7 
followed by P; a parallel composition (P; | P2), which runs P; in parallel with P2; 
or the creation of a fresh channel vx P, which creates a fresh channel x visible only 
to P. An action x can be a sender x(y), which sends the message y over channel x, 
or a receiver x(z), which receives a message z on channel x. As is standard for syntax 
definitions, the syntax of a process is recursive, meaning that it can be unfolded to 
represent processes of arbitrary size. 

A simplified operational semantics can be defined using a reduction rule (1) to 
specify the conditions under which two parallel processes can communicate; the term 
can then be reduced to the result of the computation step. In this rule, if a sender 
process x(y).P runs in parallel with a receiver process x(z).Q, then the sender and 
receiver can communicate on channel x, after which P and Q continue running in 
parallel, with the message y assigned to variable z in Q, written Q,,/,;. To allow 
multiple communicating processes to run concurrently, a rule (2) states that if P can 
reduce to P’ then the same reduction can still be applied when P runs in parallel 
with another process Q: 


X(y).P | x(z).Q —> P | Qiya (1) 
if P —> P' then P| Q — P'| Q (2) 


Full definitions of pi-calculus syntax and semantics are provided in [11, 15, 16]. 

The Visual DSD language was inspired by the pi-calculus syntax and operational 
semantics. This includes formally defining dynamic DNA nanotechnology species as 
processes that can be composed in parallel and can interact via shared complementary 
domains, similar to how processes in the pi-calculus can send and receive messages 
on channels. To this end, the Visual DSD language used a simple syntax to represent a 
class of linear heteropolymer structures that mapped to a large fraction of the nucleic 
acid circuits being designed at the time, both theoretically and experimentally. This 
included seesaw gates, proposed by Qian and Winfree [17], which fit this paradigm 
and were used in two seminal papers in 2011 to implement large-scale digital logic 
circuits [7] and artificial neural networks [8]. In addition, Soloveichik et al. showed 
that this class of structures could be used to encode arbitrary chemical reaction 
networks as DNA circuits [18]. The syntax for this class of structures [1, 15, 19] can 
be summarized as follows: 


P := AiC1(P,; | P) Strand, Complex or Parallel composition 
A ::= (S) {S} Upper or Lower strand 
Co: = {SLAS ESSR) {SrR} Segment with left and right overhangs 

| Cy: Co Complexes joined on Lower strand 

i Cyn Complexes joined on Upper strand 
S:= Dı... Dy Sequence of Domains 
D ::= Xi X^ X*\ X“ Long or short Domain, or its complement 


The definition states that a Process P can either be a Strand A, a Complex C, or a 
parallel compositions of Processes (P; | P2). Strands are either Upper (S) or Lower 
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{S} and contain a Sequence D; ... Dy of Domains, where a Domain D may be a long 
domain (X), a short toehold domain (X^), or a complement of one of these, written 
X* and X™, respectively. Complexes are comprised of double stranded segments 
[S] which may have Upper and Lower strand overhangs on either side and can be 
concatenated along their Upper or Lower strands. This syntax allows complexes 
to be written compositionally, resulting in a close correspondence between syntax 
and structure that facilitates reading and writing of programs by the user. Note that 
here we use Upper and Lower to refer to the position of strands in a 2D graphical 
representation, which is common practice and aids in the visualization of complexes. 

Using this syntax, we can program a simple Join circuit, which takes two inputs 
and produces one output as shown below, where the graphical representation is 
equivalent to the program code: 


tx x tb p b tx x to fl x 
p i á — 
tb* b* tx” x" to” x* to” 


(<tb* b> | <tx^ x> | {tb^*}[b tx^]:[x to^] | <fl*>[x] {to**}) 


This circuit will be used as a running example throughout the remainder of the paper. 
It illustrates the syntax of strands and complexes, which are the two main types of 
DNA species, and their parallel composition. As stated in the above formal definition, 
a strand is represented as a sequence of domains enclosed in angle brackets, where 
the 3’ end of the strand is assumed to be on the right, represented graphically by 
an arrowhead, and a toehold domain is represented by appending the (^) character 
to the domain name. For example, <tb^b> represents a strand consisting of the 
toehold domain tb^ followed by the domain b. Note that a DNA strand can also be 
represented as a sequence of domains enclosed in curly brackets, where the 3° end 
of the strand is instead assumed to be on the left. For example, the strand <tb*b> 
can also be written as {b tb^}. This is because strands are identical up to rotation 
symmetry, such that we can write the same strand either from left to right or from 
right to left. A complex is represented as a sequence of segments, where each seg- 
ment is a double stranded duplex with overhanging upper or lower strands to the left 
or right. Complementary domains are represented by appending the (*) character 
to the domain name. For example, the code {tb**}[b tx]: [x to^] repre- 
sents a complex consisting of two segments. The first segment {tb**}[b _tx^] 
represents a double stranded duplex [b tx%] consisting of the strand <b tx”*> 
bound to its complementary strand {b* tx**}, with an additional single lower 
strand overhang {tb^* } to the left of the duplex. The second segment [x to^] 
represents a double stranded duplex consisting of the strand <x to^> bound to its 
complementary strand {x* to^*}. These two segments are joined together along 
the lower strand by the operator (:). Although the textual representation of complexes 
is defined as a connection of segments, in reality the connection results in a single 
continuous lower strand, as can be seen by the graphical representation. More gen- 
erally, this is also the case for complexes joined along the upper strand, with the 
potential for multiple disconnected bottom strands. 
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We also defined an operational semantics for the language, which formalizes 
how DNA species can interact. Using the syntax introduced above, we present two 
simplified rules: a Bind rule for toehold-mediated binding of strands, and a Migrate 
rule for rightward branch migration of an invading strand displacing an incumbent: 


(ye 
Bind,N 
ak row R | = 
SM > 
(L N^R) | {L N* R) SX, (ULIN IRHR') 


& < 
Qo oe 
l s1 s s2° me lsi s s2 © 
S1* S* S2* / SLF (SF S2* 
Vv Vi a 
{L'}L)ES\ MS Ro) (LOIS SoM R){R'} ZES (LULES, SIR): (L1 S)[So(R){R} 


In the rules above, the symbols L, L’, L, and R, R’, R) match a (potentially empty) 
sequence of domains. In the Bind rule, the upper strand (L N^ R) binds to the lower 
strand {L’ N™* R’} via the complementary toehold domains N° and N™, producing a 
single complex {L’}(L)[N](R){R’}. A corresponding Unbind rule (not shown) would 
be similar, except that the direction of the reaction would be reversed. In the Migrate 
tule, the invading domains S that are unbound on the left-hand side of the rule become 
bound on the right-hand side of the rule, displacing the incumbent domains S that 
were previously bound. Given that the bound S2 domains are also present, the strand 
itself remains attached to the complex. A corresponding Migrate Left rules is also 
needed for migration in the other direction (not shown). Full definitions of the Visual 
DSD syntax and semantics are provided in [1, 15, 19], including additional rules that 
allow reductions to take place inside joined complexes and parallel compositions. 


1.1.2 Chemical Reaction Networks 


An additional contribution of the Visual DSD system was to formalize the compila- 
tion of a program, representing an initial set of DNA species, into a computational 
model describing how these species can interact with each other. We use the term 
compilation from computer science, where it denotes the translation of a higher-level 
language into a lower-level one — typically executable machine code. In the case of 
Visual DSD, the output of compilation is an executable kinetic model, formalized as 
a chemical reaction network (CRN) [20], which is defined as a set of reactions. Each 
reaction consists of a multiset of reactant species, a reaction rate and a multiset of 
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product species. CRNs bear many similarities to Petri nets, a graph-based formalism 
for distributed systems that takes the form of a bigraph containing places (analogous 
to chemical species) and transitions (analogous to chemical reactions), which has 
also been used to model biological systems [21, 22]. Here, the species of the CRN 
are the species of the Visual DSD program, and the compilation rules are defined by 
the operational semantics of the Visual DSD language. This allows compilation of 
arbitrary programs drawn from an infinite set. In this way, Visual DSD mirrors the 
edit-compile-run cycle of traditional computer programming. Essentially, the pro- 
grammer uses their mental model of how DNA species interact to carefully design a 
program consisting of an initial set of species with an intended behavior. Compilation 
applies the rules of the Visual DSD language to the program to generate a CRN rep- 
resenting all possible interactions between species. The CRN is then executed using 
a chosen simulation algorithm for a particular set of initial conditions, resulting in an 
execution trace of the behavior of the species over time. The programmer can then 
revise the Visual DSD program if the observed behavior differs from the intended 
behavior. 

Compilation of DNA species to CRNs is achieved by applying the reduction 
rules in a recursive loop to enumerate all possible reactions. Briefly, the compilation 
algorithm [1, 19] works by starting with an empty set of processed species and a set 
of unprocessed species corresponding to the species initially present in the Visual 
DSD program. At each step of the loop, the algorithm removes a species from the 
unprocessed set and computes all possible unimolecular reactions together with all 
possible bimolecular reactions involving the existing processed species. The resulting 
reactions can in turn generate new species, which are added to the unprocessed set. 
This loop is repeated until the set of unprocessed species is empty. In this way, 
the algorithm enumerates all species and reactions that can be generated from the 
initial species. We also generalized this approach to compile and simulate programs 
expressed in different languages [15], including the stochastic pi-calculus. In general, 
this approach allows for potentially unbounded numbers of species, meaning that 
compilation of Visual DSD programs may not terminate. As with most programming 
languages, checking for termination is not possible in general so it is up to the 
programmer to develop programs that terminate. To help mitigate this, Visual DSD 
also provides a just-in-time mode that interleaves compilation with simulation (see 
Sect. 1.2.1). 

The compiled CRN for our running Join circuit example is as follows: 


b tx x to tb b x tö 
—, tb b k\ + b t& 
tb® b¥ ty* xF tot Sk tb* b* tx* x* tor 
Join Inputl Join2 Reverse 
tb b x to tb b tx x 
+ tx x —kK\ + XxX to 
tb* bt tx* x* to* Vk tb* b* xt x* tot 
Join2 Input2 Output 
fl x x to 
n ee mice —k\ —— ae x 
x* to* x*  to* 


Reporter Output Signal 
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The first reaction in the forward direction is derived from the Bind rule, with 
N =tb, R =b, and L, L', R' = Ø. This is followed by an application of the 
Migrate rule, which displaces the b domain, with S; = tb^ S = b, Sy) = tx, and 
L, L', Li, R2, R, R' = Ø. This is followed by a reaction to unbind the distal tx 
toehold, which is an application of the Unbind rule with N = tx, L = b and 
L’, R, R' = Ø. In the figure above, these three reactions are merged into a single 
forward reaction, and similarly for the corresponding reverse reaction, resulting in 
the first reversible reaction shown above. The merge assumes that displacement and 
toehold unbinding reactions are much faster than toehold binding reactions, as defined 
by an Infinite DSD semantics [19], which we discuss further below. The other two 
reactions are derived similarly, where the final reaction is irreversible. 

The behavior of the CRN can be summarized as follows, where strands and com- 
plexes are named for convenience. In the first reversible reaction, the Input1 strand 
<tb* b> binds to the Join complex {tb**}[b tx*]: [x to^] to produce 
a Reverse strand <b tx^> and an intermediate Join2 complex. In the second 
reversible reaction, the Input2 strand <tx^ x> binds to the Join2 to com- 
plex produce an Output strand <x to^>. In the third reaction, the Output 
strand binds to the Reporter complex <f1^> [x] {to**} to producea Signal 
strand <f1^ x>, whose fluorophore emits light that can be measured. Note that 
the Output is only produced if both inputs are present, which corresponds to the 
desired Join circuit behavior. This CRN can then be simulated and analyzed using a 
range of computational methods that facilitate nucleic acid circuit design, as outlined 
below. 


1.2 Visual DSD Evolution 


Building on our initial work, over time we sought to generalize the set of nucleic acid 
circuits that could be designed and analyzed with the Visual DSD system. This was 
motivated by corresponding experimental advances that implemented new types of 
nucleic acid structures and interactions not yet supported by Visual DSD. In addi- 
tion, even when the Visual DSD system was first created there were a number of 
published experimental systems that were not supported, including hairpin assem- 
bly systems [23], hybridization chain reaction systems [24], and the original DNA 
strand displacement tweezer system of Yurke et al. [25]. Furthermore, these limita- 
tions in syntax meant that certain structures, such as branching structures, that could 
potentially be generated even when starting from the subset of supported linear struc- 
tures, could not be represented. This further motivated the subsequent evolution and 
generalization of the Visual DSD syntax and semantics, which we outline below. 
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1.2.1 Polymer Structures (2011) 


Some of our earliest work on expanding the set of nucleic acid structures supported by 
Visual DSD was to include unbounded polymer structures created by connecting mul- 
tiple nucleic acid complexes. This was achieved by adding new semantic rules to enable 
multi-stranded complexes to bind via complementary overhanging sequences on the 
ends of the complexes (but not partway along, which would result in the formation of 
tree-like structures). Such polymer systems had previously been shown to enable the 
direct representation of a Turing-complete stack machine in nucleic acids [26]. 

A practical issue with these types of structures is the possibility of generating 
an infinite CRN, since polymers can grow without bound during reaction enumera- 
tion. To address this, we introduced a just-in-time (JIT) enumeration algorithm for 
stochastic simulation [15], which compiled reactions on-the-fly as they became possi- 
ble during simulation. By interleaving reaction enumeration and simulation steps, the 
algorithm only enumerates the finite set of reactions that occur in a single stochastic 
simulation, rather than attempting to enumerate the infinite set of possible reactions. 

This approach draws on the notion of just-in-time compilation from computer 
science. In general, computer programs written in high-level languages are typically 
compiled into a lower-level language such as machine code, before they can be 
executed. This can be done ahead of time, such as for the C programming language, 
or at runtime while the program is being executed, such as for the Java programming 
language. This is called just-in-time compilation, where Virtual Machine bytecode 
such as Java bytecode is compiled into machine code at runtime, and it inspired our 
algorithm for stochastic simulation of potentially infinite CRNs [15]. 

We illustrate our approach by modifying our running Join circuit example as 
shown below, including replacing the double stranded to^ duplex with a single 
stranded hairpin: 


tb x tx x x tx x x 


= x) to tb C 
X 


*  to* 


(<tb* x> | <tx^ x> | {tb^*}[x tx^]:[x]{to^> | <tb^}[x]{to^*}) 


This results in a hairpin-based Join circuit capable of forming unbounded poly- 
mers. Below are some of the initial reactions from the resulting CRN, as enumerated 
by Visual DSD using JIT compilation: 


x tx x 5 b k tb x x 5 r 
— to + x — = — to + * x 
e x* tx* x* tb* x* tx* x* 
tb x x k tb x tx x 
== Oto +k x, —*4 == 
th* x* D x* tb* x? tir x* to x 
tb x x 5 ee k x tx 5 ie 
to + a — — to + 
tbe xt txt x* tb" x t" x* 
tb x tx x x k tb x tx x tot x* tb x 
+ bC t — 
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The last reaction demonstrates the binding of two complexes to open up the tb” 
hairpin, resulting in an exposed <tb^ x> strand that can in turn interact with the 
Join complex to continue growing the polymer indefinitely. Note that here we omit 
the hairpin closing reactions, however these can be enabled as a type of leak reaction 
(see Sect. 1.2.3 and the Visual DSD manual for details). 

Using this approach, we were able to simulate and analyze a Turing-complete 
stack machine in Visual DSD, based on an efficient encoding of stacks as DNA 
nanostructures [27]. 


1.2.2 Semantic Abstractions (2012) 


We also sought to enhance the flexibility of Visual DSD by encoding a number of dis- 
tinct assumptions about DNA strand displacement kinetics as semantics rules. This 
produced an abstraction hierarchy of modeling assumptions [19], ranging from a 
Detailed semantics that explicitly models all toehold binding, branch migration, and 
toehold unbinding steps as distinct reactions, to an Infinite semantics in which branch 
migration and toehold unbinding are assumed to be instantaneous. These assump- 
tions were implemented as options to the reaction enumeration algorithm. Infinite 
mode tends to be a good approximation at low concentrations, where unbinding and 
branch migration are substantially faster than binding and can be effectively mod- 
eled as infinitely fast. Detailed mode tends to be a better approximation at higher 
concentrations, though this comes with a higher computational cost. 

Anexample of a strand displacement step compiled in Detailed mode is as follows, 
based on our running Join circuit example: 


b tx x to 
— — 


be p pe” x* to” vu “tb* b*  tx* x* to* 
ó 
tb D b tx x to ms tb b tx x to 
tb* b b” x* to” m tb* b* be x* to* 
ó tb b x to 

tb b tx x to u >, b tx 
— — — = —= —— 

tb* b* tx* x*  to¥ Vk tb*® b tx x* to¥ 


Here, binding, unbinding, and migration reactions are assumed to have finite rates 
k, u, and m, respectively. In contrast, when the circuit is compiled in Infinite mode 
the above 3 reversible reactions are merged into a single reversible reaction: 


b tx x to tb b x to 
ee nt tb b kK\ a b tx 
—. —s — —= —— — å 
tb” b* tx* x* tot \ k tb" p t" x* “to¥ 


Here, the concentration of species is assumed to be sufficiently low that the rates 
of unbinding and migration reactions are infinite compared to the rates of binding 
reactions. As a result, strand displacement is assumed to take place in a single step 
that merges binding, migration, and unbinding. Since branch migration is infinite, 
complexes are also considered equal up to branch migration. More generally, this 
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approach allows a circuit to be analyzed at varying levels of detail without needing 
to modify the Visual DSD program, facilitating a trade-off between the accuracy of 
the model and the computational cost of the analysis [19]. 


1.2.3 Leaks (2012) 


Experimental results were increasingly demonstrating that DNA strand displace- 
ments circuits often did not function as intended, in many cases due to the presence 
of leak reactions in which an invader strand displaces an incumbent at a low rate, 
in the absence of a complementary toehold domain. Such leaks are typically unin- 
tended because they do not follow a toehold-mediated reaction pathway and can be a 
major source of unwanted signal in experimental implementations of DNA circuits. 
In response to the growing body of experimental data demonstrating the occurrence 
of leaks, we extended the Visual DSD system to model these types of reactions [19]. 
We achieved this by extending the Visual DSD semantics with additional leak rules, 
which essentially correspond to versions of the branch migration rules in which no 
toehold is present and the invading strand is not part of the same complex as the 
incumbent. 

A leak reaction occurs when the nucleotides at one extremity of a bound strand 
spontaneously unbind, creating a short toehold that facilitates a strand displacement 
reaction. In our running Join circuit example, the Input2 strand can displace a 
bound Output strand from the Join complex, even in absence of the Input1 
strand. This happens when one or two nucleotides at the 5’ end of the bound Output 
strand spontaneously unbind, creating a short toehold that allows the x domain of 
the Input2 strand to displace the Output, as follows: 


q 
b tx x å to + 
Ean s o x en b XN X x to 
—— —— — <a —— — 
tb* b* tx* x* to* k tb* b* tx*  x* to* 


While the leak rate (107° nM~!s~') is typically several orders of magnitude slower 
than the toehold-mediated strand displacement rate (k), over time it can still result in 
the accumulation of unwanted Signal strand even when only one of the two input 
strands is present, which is not the intended functionality of the Join circuit. More 
generally, this work highlights the potential of Visual DSD to model and predict 
experimental interactions that are not specifically intended by the circuit designer. 


1.2.4 Localized Components (2014) 


Another important development in the field of dynamic DNA nanotechnology was the 
design of spatially localized circuits on DNA origami tiles, first computationally [28, 
29] and later experimentally [30, 31]. This approach combines aspects of dynamic 
and structural DNA nanotechnology to improve both the speed and scalability of 
circuit designs. To incorporate this new development, we generalized the Visual DSD 
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system by introducing syntax to represent nanostructures tethered to a tile surface 
[32]. Our initial approach was to tag tethered components with labels that indicate 
which components are tethered close enough to interact, and to model the effects of 
locality on reaction kinetics by associating a local concentration with each tag. This 
concentration was either estimated by the user or inferred from experimental data. 
We separately showed that SMT-based constraint solving can be used to determine 
satisfiability of the geometric constraints inherent in tethered molecular structures 
[33]. More recently, we used simple biophysical models to computationally estimate 
the rate constants for such localized reactions [34]. 

We illustrate our approach with a modified version of our running Join circuit 
example: 


fl x fl x 
yt to* 0) x* to* ® 
t b ks bo w 4 
— = aé 
b tx x 5 tb b x 5 
— =—— to p — to 
Ô De br er Q) tbe b Or 
fl x fl x 
2 — — 
x* tor Ô x* tor Ô 
+ tx x aks 
tb b x tb b tx x 
2 Dto 2 


* 
(10000*k) s of x 4 


The Join species is modified by replacing the double stranded to^ duplex with 
a single stranded to^ hairpin and by adding a tether with location tag 1 on the 3’ end 
of the tb** overhang. In addition, the Reporter is modified so that it contains a 
tether with the same location tag 1 on the 5’ end of the to” * overhang. This models 
the assumption that the two species are tethered close to each other, such that their 
effective concentration is given by 1, here assumed to be 10,000 nM. In practice, these 
local concentrations can be estimated from data using parameter inference methods 
[30]. In the resulting CRN, the freely diffusing Input1 strand <tb^ b> binds 
to the Join complex tethered to the tile and displaces the <b tx^> strand. The 
freely diffusing Input2 strand <tx^ x> then opens the hairpin, exposing the to^ 
toehold. This then binds to the exposed to** toehold of the tethered Reporter 
complex and displaces the Signal strand <f1 tx^> . The interaction is scaled 
by the local concentration 1, since the Reporter and Join complex are tethered 
in close proximity to each other. Importantly, the resulting scaled rate 10000*k of 
this reaction is unimolecular with units s~!, since it involves two complexes tethered 
to the same origami at fixed locations, and therefore the interaction between these 
two tethered complexes is not affected by the concentration of strands in solution. 
This approach was used to model the kinetics of localized logic circuits, by inferring 
local concentrations from experimental data [30]. 
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1.2.5 Custom Reactions (2014) 


Throughout this period, a range of architectures for dynamic DNA nanotechnology 
were continually being developed, including some that made use of not only DNA 
strands but also DNA or RNA enzymes. Two prominent examples include the PEN 
DNA toolbox, developed in the Rondelez lab [35, 36], and the Genelet system, devel- 
oped in the Winfree lab [37] and subsequently refined in the Schulman lab [38]. Other, 
related enzyme-driven architectures have also been developed [39, 40]. These archi- 
tectures rely on DNA or RNA enzymes such as polymerases, exonucleases, and nick- 
ases to implement computational circuits. The PEN DNA toolbox, in particular, has 
a suite of multiple software tools for circuit design and analysis [41-43]. Since the 
enzymatic components of these systems were not supported in the Visual DSD system, 
which solely modeled nucleic acid interactions, we developed an extension to enable 
the insertion of custom, user-specific chemical reactions into the compiled CRN model. 

A simple example of a custom reaction is shown below, by modifying our running 
Join circuit example. 


oin tb b 


Noo ~ 
fl x x to 
x to 4 =% — fl x 
x* “to* x* oF 


This models the assumption that the Input1 strand <tb”b> is produced at rate 
0.1 and consumed at rate 0.01 by enzymatic synthesis and degradation, respectively. 
Note that production is modeled here with a constant rate, hence no reactant is 
specified. Since Visual DSD does not support an explicit representation of enzymes, 
their effects are modeled at an abstract level using custom reactions. Furthermore, in 
the Join circuit example the default rate of the strand displacement reaction is equal 
to the rate k associated with the toehold to^ that mediates the reaction. However, 
in practice we may wish to allow different reactions mediated by the same toehold 
to take place at different rates, in this case to model the fact that the presence of 
the fluorophore on the reporter complex gives rise to a different strand displacement 
rate r. We used custom reactions such as these to demonstrate the modeling of 
feedback control circuits in Visual DSD with three different architectures: strand 
displacement reactions, Genelets, and the PEN DNA toolbox [44]. 


1.2.6 Complex Topologies (2016) 


The above versions of Visual DSD were all based on the original underlying syntax 
of linear polymers. However, by this time experimental techniques using complex 
branching structures were becoming increasingly common, but were not supported 
by this syntax. To keep pace with these experimental techniques, we developed 
a new version of Visual DSD that supported arbitrary graph structures, based on 
the notion of strand graphs [45]. Graphs consist of nodes connected by edges and 
have found application in many areas of computer science, such as modeling the 
structure of the Internet. We note that other work had previously used graphs to model 
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DNA nanostructures [46—48]. Site graphs are a generalization of graphs in which 
each node has multiple named sites and each edge connects two sites. Site graphs 
were already being used to model complex protein structures and their interactions, 
for instance by the Kappa language [49]. We proposed strand graphs [45] as a 
variant of site graphs in which the sites are ordered, where each node represents 
a strand and each site represents a domain, the order of the sites corresponds to 
the order of the domains in the strand, and an edge represents a bond between two 
complementary domains. The strand graph data structure is highly general, enabling 
arbitrary DNA nanostructures to be represented, including pseudoknots (structures 
with non-nested bonding patterns). We defined semantic rules that generalized the 
interactions between DNA strands to support these complex topologies, while also 
preserving all of the rules from previous versions of Visual DSD based on linear 
polymers [45]. For the remainder of this paper we refer to the Visual DSD syntax 
based on linear polymers as the classic syntax. 

The complexes from our running Join circuit example can be expressed in strand 
graph syntax as follows, where the graphical representation above corresponds to 
the textual representation below: 


tb b tx x b tx x to fl x 
— v —— F —_ _— 
oO; 1 2i 3i 4 
tA p" 0” x= to? x* tof 


( <tb^ b> | <tx^ x> 
| [ <b!0 tx*!1> | <x!2 to^!3> 

| <to**!3 x*!2 tx**!1 b*!0 th**> ] 
| [ <fl* x!4> | <to** x*!4> ] ) 


The program consists of a multiset of strands, separated by the parallel compo- 
sition operator. As with the classic syntax there are two types of species, sin- 
gle strands and complexes. The single stranded Input1 species <tb^ b> and 
Input2 species <tx^ x> have the same textual representation as in the classic 
syntax, while the Join complex consists of two shorter strands <x!2 to*!1> 
and <b!4 tx*!3> bound toa longer strand <to**!1 x*!2 tx7*!3 b*!4 
tb**>,and the Reporter complex consists ofa Signal strand £1* x! 4 bound 
toaQuencher strand <to^* x*!4>.Complexes are enclosed in square brackets, 
assuming that the strands within the square brackets form a connected component, 
meaning they are all connected to each other via named bonds, with autogenerated 
numbers used for new bonds generated during reaction enumeration. Complexes are 
considered equal up to renaming of bonds, meaning that individual bond names do 
not matter provided they are distinct. Note that the textual syntax assumes that all 
strands are written from the 5’ end on the left to the 3’ end on the right, whereas in 
the graphical representation the longer strand is rotated to align with the two shorter 
strands. This reflects the fact that complementary strands bind anti-parallel to each 
other, where the 3’ end of one strand aligns with the 5’ end of the other. In practice the 
bonds can be omitted from the graphical representation for conciseness and improved 
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readability, since they are only used to indicate connectivity. While the Join example 
shown can also be represented in the classic syntax, the strand graph syntax can 
be used to represent much more complex structures, including branching structures 
and pseudoknots. Side-conditions on the corresponding inference rules serve to limit 
applications of the rules that could generate physically implausible structures, such 
as binding within tight hairpin loops [45]. To illustrate this, we used the strand graph 
version of Visual DSD to analyze nucleic acid circuits with branching topologies that 
were previously implemented experimentally. We refer the reader to Petersen et al. 
[45] for details of the strand graph semantics that supports arbitrary structures, and to 
Spaccasassi et al. [50] for a more general semantics in terms of logic programming, 
which we discuss in Sect. 2.1. 


1.3 Visual DSD Analysis 


Over time, a number of model analysis capabilities were added to the Visual DSD 
system. This was facilitated by the formal underpinnings of Visual DSD including a 
well-established formal semantics, which allowed a range of computer-aided verifi- 
cation and analysis techniques to be brought to bear [51]. In this section we briefly 
outline some of these methods and illustrate their application using the running Join 
circuit example. 


1.3.1 Probabilistic Model Checking (2012) 


Model checkers can automatically verify whether a state-transition system satisfies 
a specification expressed in a temporal logic such as computation tree logic (CTL), 
which is a branching time logic. For example, the following CTL formula 


A[F "terminal" ] 


states that all possible paths (A) through the state space will finally reach (F) a 
“terminal” state. We can illustrate the application of this formula to the running Join 
circuit example, by analyzing the state space that is generated from an initial state 
containing one copy of each species. This state space can be represented graphically 
as follows: 


Initial” "Terminal" 


1 tb b Iw x 1 x to 
Trputt -Tpu Output 


; U. f ` x & 
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Each state contains a multiset of species, and each transition between states cor- 
responds to the execution of a single reaction, where the rate of the transition is given 
by the rate of the reaction multiplied by the number of copies of each species involved 
in the reaction. Here the state space is relatively simple since only one copy of each 
species is present initially, however the number of states can grow exponentially with 
the number of copies. The “initial” and “terminal” states are labeled and outlined in 
bold using black and red colors, respectively, where the terminal state is defined as a 
state with no outbound edges, meaning that no reactions are possible from this state. 
Here we can see by inspection that the system satisfies the above formula, since it 
always converges to the "terminal" state. 

For more complex state spaces we used the PRISM probabilistic model checker in 
combination with Visual DSD to verify a range of correctness properties expressed 
in a probabilistic temporal logic [52]. Using this approach, we showed how prob- 
abilistic model checking can identify design flaws resulting from circuit cross- 
talk, validate garbage collection schemes that clean up strands with exposed toe- 
holds, and compute the probability of reaching a consensus state in an approximate 
majority voting circuit [52]. This approach is tractable for relatively small num- 
bers of molecules, where stochastic effects are important, such as spatially local- 
ized molecules. For large numbers of molecules in solution, system behavior can 
often be viewed as deterministic, in which case an ODE is generally a suitable 
approximation. 


1.3.2 Model Simulation and Parameter Inference (2013) 


Visual DSD was originally developed with built-in methods to simulate the behavior 
of a CRN over time and dynamically visualize the simulation output [6]. This imme- 
diate visual feedback of circuit behavior helped to accelerate the design of nucleic 
acid circuits. A standard stochastic simulator was implemented first, followed by a 
just-in-time stochastic simulator that supported unbounded CRNs [15]. A determin- 
istic solver was also added, which generated ordinary differential equation (ODE) 
models from the CRN according to standard mass action kinetics. This deterministic 
solver was used to incorporate Bayesian parameter inference using Markov Chain 
Monte Carlo (MCMC), allowing model parameters to be inferred from experimental 
data. The plots below show fitting of the model parameters to measured datapoints 
(left) and examples of marginal posterior distributions for parameter values obtained 
via Bayesian parameter inference (right). The rate k = 0.003/nM/s was inferred from 
the data points shown using simulations for initial concentrations of 10 nM Input1 
and Input 2, 100 nM Join, and Reporter. 
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cen 


This parameter inference method was used to help design an experimental imple- 
mentation of a molecular consensus algorithm using two-domain DNA strand dis- 
placement reactions [10], and a spatially localized architecture for fast and modular 
DNA computing [30]. 


1.3.3 Satisfiability Modulo Theories Solving (2013) 


Satisfiability modulo theories (SMT) solvers generalize Boolean Satisfiability (SAT) 
solvers by incorporating additional theories, such as theories for integer-valued or 
real-valued arithmetic. This approach can be used to verify functional properties 
of nucleic acid circuits, including to guarantee properties of the terminal states of a 
circuit as a function of its initials state. One key advantage of using the SMT approach 
is that the analysis can scale to millions of copies of a circuit in parallel, though this 
comes at the cost of not being able to analyze temporal or probabilistic properties of 
the circuit. Another key advantage is that circuits can be verified for arbitrary input 
conditions, as opposed to a single combination of inputs. 

We integrated the Z3 SMT solver [53] with Visual DSD and used this to prove 
the functional correctness of large-scale DNA strand displacement circuits [54]. We 
illustrate the method using the running Join circuit example, by verifying that this 
circuit satisfies the predicate Panp(qo, q) defined in [54] . The predicate specifies that 
the system running from initial state go to final state q produces a final quantity of 
output q (O) that is the smaller of the two initial input quantities, qo(71) and go (Ja): 


Pano (Go. 4) > q(O) = min(go(), goU2)) 


The method also identifies the precise constraints on the initial conditions that are 
required for the specification to be satisfied, in this case that the initial number 
of Reporter and Join complexes must be greater than the initial number of 
Input1 and Input2 strands. This allows us to prove the correctness of the circuit 
for arbitrary inputs, provided the constraints are satisfied. The logical behavior of 
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the AND circuit is formalized using a threshold 6, where an input or output strand 
represents a logical True if an only if the number of copies of the strand is greater 
than 6. We verify that the system satisfies the following formula, which specifies that 
the final output value g(O) being above a threshold 0 is equivalent to both initial 
input values go(J;) and go(J2) being above that threshold: 


[q(O) > 0] ==> [go(h) > 8] A [goU2) > 0] 


Together these formulae specify that the Join circuit will function as an AND 
logic gate on its two inputs. Using this approach, we verified the behavior of a 
4-bit square root circuit, together with the components used for its construction, 
even when millions of copies of the circuits are interacting with each other in 
parallel. This method is also applicable to the analysis of chemical reaction net- 
works with large species counts, to provide guarantees for arbitrary initial condi- 
tions, though without taking into account kinetic rates. This complements alterna- 
tive methods such as interactive theorem proving, which has been used to verify 
coupled phase transitions in population protocols [55]. We have also used SMT 
solving to check the geometric constraints inherent in localized molecular circuit 
interactions [33]. 


1.3.4 Spatial Simulation with Partial Differential Equations (2014) 


To keep pace with the increasing number of nucleic acid circuits being implemented 
in a spatial context at the time [56, 57], we added a partial differential equation (PDE) 
solver to Visual DSD to model diffusive aspects of spatially heterogeneous circuit 
designs. The general form of PDEs that Visual DSD can solve are those that can be 
described by the equation 3c = f(e) + DV’c. 

We illustrate this spatial simulation method using the running Join circuit example, 
by specifying the initial placement of the two input species. The Input1 species 
is initialized with a value of 10 nM in a circular region of width 0.4 located at 
coordinates (0.3, 0.3), while the Input2 species is initialized in a similar fashion 
at coordinates (0.7,0.7), where the coordinates are expressed as fractions of the 
dimensions of the 2D surface. As in the well-mixed system, we assume 100 nM 
Join and Reporter complexes in solution throughout. We also include additional 
spatial directives, which specify zero flux boundary conditions, and that the two input 
species diffuse at rate 0.5. The 2D spatial domain is a square of edge 50 mm, with a 
grid resolution of 100 divisions in each dimension and a simulation time step of 1s. 
The heat maps below shows the quantity of Signal using the 1-dimensional (left) 
and 2-dimensional (right) PDE solver. 
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In both simulations we observe signal in the region of space where both sig- 
nals overlap due to diffusion, consistent with the desired behavior of a spatial Join 
circuit. This approach was used to model a number of DSD systems with non- 
trivial spatial dynamics, including an autocatalytic network, a predator-prey sys- 
tem, and a spatial molecular consensus algorithm [58]. It was later used to assist 
with the design of spatial DNA-based communication in populations of synthetic 
protocells [59]. 


1.3.5 Probabilistic Model Checking with the Chemical Master 
Equation (2015) 


While our previous probabilistic model checking approach was highly efficient for 
analyzing a system at a single point in time [52], we sought to substantially improve 
the efficiency of analysis over multiple time points. We achieved this through numeri- 
cal integration of the Chemical Master Equation (CME), which is a system of ordinary 
differential equations whose solution yields the probability that the system is in a 
given state over time. For circuits with large numbers of molecules, a standard deter- 
ministic simulator based on mass action kinetics can typically be used. However, for 
circuits with small numbers of molecules, such as most localized circuits, methods 
such as CME integration are significantly more accurate. 

We illustrate our method using the running Join circuit example. The plots below 
show the results of the analysis assuming 10 copies of the Input1 and Input2 
strands and 100 copies of the Join and Reporter complexes. The plots show 
a timecourse (left) of the mean (solid line) and standard deviation (shaded region) 
of the Input1 (red), Input2 (green), Output (blue) and Signal (yellow), 
and a heatmap (right) of the full probability distribution for the Signal strand 
over time, together with a histogram of the probability distribution at the final 
time point. 
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This method is particularly well-suited to analyzing localized molecular circuits, 
since each location typically contains a small number of molecules. In the example 
above, even though only 10 copies of each input are present initially, the state space 
consists of 286 states, including a single terminal state, and 1100 transitions between 
these states. We analyzed a localized circuit for computing the square root of a four- 
bit number, proved its correctness with respect to its functional specification and 
analyzed the extent to which the localized design improves both speed and scalability 
in comparison to well-mixed circuits [60]. 


1.3.6 Verification of CRN Equivalence (2015) 


A key goal in computer-aided design is to formally verify the correctness of systems 
in a modular way, such that verified components can be combined to build more 
complex systems that are correct by construction. We have illustrated how nucleic 
acid circuits can be modeled as CRNs that can be simulated and analyzed. However, 
CRNs are also a powerful means of specifying the intended behavior of nucleic acid 
circuits, where a high-level CRN specification is compiled to its low-level nucleic 
acid implementation [18]. An important challenge in the field has been to verify that 
a nucleic acid circuit is a correct implementation of a CRN specification. 

We developed a technique for proving correctness of CRN implementations [61] 
based on the concept of serializability from database theory, which requires that inter- 
leaved concurrent updates to a database must be equivalent to some serial schedule 
of those updates. While this proof technique has not yet been implemented in the 
Visual DSD system, we can demonstrate its application using our running Join cir- 
cuit example, by analyzing the state space diagram of the circuit from Sect. 1.3.1. 
We can prove that the Join circuit is a correct implementation of the CRN specifi- 
cation Input1 + Input2 —> Signal, where the species in this CRN denote 
the formal species of the Join circuit, and the Join and Reporter species denote 
the fuels of the circuit, defined as any species that must be present initially in order 
for the chemical reactions in the circuit to run to completion. The remaining species 
of the Join circuit are defined as either intermediates or waste. The proof relies on 
demonstrating that any trace generated by the circuit can be rewritten to produce a 
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serial trace which corresponds to a valid execution of the CRN specification. For 
this simple example we can show that the state space from Sect. 1.3.1 satisfies the 
required conditions for correctness from the theorems proved in [61], including that 
the terminal state is universally reachable and that every trace from the initial state 
has a commit reaction, which is an irreversible step where all formal reactants have 
been consumed before any formal products are produced. In this case the rightmost 
transition denotes a commit reaction. Thus, the Join circuit is a correct implementa- 
tion of the CRN specification. Furthermore, the correctness is preserved in arbitrary 
contexts provided certain constraints are satisfied, including that the only shared 
species are formal species, waste, and certain intermediates where the gate design 
can tolerate the presence of additional copies of that species. This obviates the need 
to check the correctness of the circuit under different initial conditions or when it is 
composed with other circuits, which is a key advantage of our approach. We used 
this method to verify two different nucleic acid implementations of a distributed con- 
sensus algorithm, specified as a CRN with multiple reactions, where each reaction 
implementation was verified separately and composed in a modular fashion. 

More recently, the Winfree group has reported a number of proof techniques based 
on pathway decomposition [62] and bisimulation-based approaches [63], which have 
been integrated into the Nuskell CRN-to-DSD compiler [64]. This provides an inte- 
grated system for designing and verifying CRN translation schemes. We refer the 
reader to Sect. 2.2 for further discussion of this work. 


2 Present 


2.1 Logic Programming Framework 


As nucleic acid circuits continued to increase in complexity, we sought to develop 
a unifying framework that could support not only complex nucleic acid topologies 
but also both DNA and RNA enzymes. In addition, we sought to develop a system 
that was flexible enough to encode a range of alternative modeling hypotheses and 
to readily incorporate new dynamic DNA nanotechnology implementation strategies 
developed in the future. This led us to rebuild the Visual DSD system based on a logic 
programming framework [50], which we refer to as Logic DSD. Importantly, Logic 
DSD subsumes and unifies all previous syntactic and semantic extensions to the 
Visual DSD system, by implementing a single rule-based abstraction. The challenge 
in this work was to develop a modeling language that not only embodied a specific 
semantics but was also sufficiently expressive to allow new user-specified semantic 
rules to be defined within the language itself. 

Logic programming is a powerful computational paradigm that originally found 
favor in the fields of knowledge representation and artificial intelligence. It allows 
the programmer to implement arbitrary rule sets in a framework that encapsulates 
the assumptions underlying their particular experimental system. Logic programs 
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consist of collections of facts and clauses, which can be used in a proof search 
procedure to determine whether user-provided queries are satisfiable given those 
facts and clauses. As an example, the logic program 


human("Socrates"). mortal(X) :- human(X). 


specifies that Socrates is human (a fact) and that if X is human then X is also mortal (a 
clause). These can be used by a resolution engine to prove that Socrates is therefore 
mortal. In Logic DSD we use logic programming over strand graph representations 
of nucleic acid nanostructures, by generalizing the strand graph syntax developed 
previously [45]. This enables models to be compiled by proof search on semantic rules 
written in a logic programming language, which encodes specific user assumptions 
about how reactions and their rates are generated [50]. 

In Logic DSD, instead of the compilation rules being hard-coded within the lan- 
guage itself, the language includes a very general set of rules for proof search over 
nucleic acid structures [50]. This required the development of an equational theory 
of strands, which is a notion of equality between species that is required so that the 
system can determine the equivalence of candidate structures and those represented 
in the rules. This means that the rules that specify the desired semantics are pro- 
vided as part of the program, alongside the circuit design. We define a default set 
of rules that corresponds to the semantics of previous versions of the Visual DSD 
system, however these can also be replaced by custom rules written by the user. This 
allows the user to potentially use different sets of rules for different types of cir- 
cuits, depending on the modeling assumptions and implementation strategies, such 
as whether specific enzymes are present. 

We illustrate Logic DSD using the running Join circuit example. The program code 
is the same as the strand graph code from Sect. 1.2.6, copied below for convenience: 


tx x tb b b tx x à to fl x 
tb* pr  tx* xE tot x* to* 
( <tb* b> | <tx* x> | [ <£1* x!4> | <to** x*!4> ] | 


[<b!0 tx*!1> | <x!2 to*!3> | <to**!3 x*!2 tx**!1 b*!0 tb**>]) 


In addition, this code now needs to be accompanied by logical rules that define its 
semantics. The rules make use of contexts that are matched to the program in order 
for the rules to be applied. A given process can be matched to a context with N holes, 
written C[]; . . . []y, where each hole represents a part of the structure that is matched 
to arule and can be subsequently modified when the rule is applied. 

Each hole in a rule is filled by a pattern, which can be one of the following: a 
strand < S > containing a sequence S of domains or logical variables that match 
a domain; the 5’ end of a strand < S; the 3’ end of a strand S>; a sequence S of 
domains or logical variables; or a nick S1 > | < S2 between two strands, which 
denotes a break. When a context is matched to a process, the patterns in the holes 
are matched to the corresponding parts of the process and the variable C is matched 
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to the rest of the process. This combination of contexts and patterns allows strand 
graph rewriting rules to be expressed. In addition, the use of predicates allows highly 
expressive conditions to be defined, which need to be satisfied in order for the rule 
to be applied. 

We illustrate two of the main rules and their application to the Join circuit example 
below, and refer the reader to [50] for complete definitions. The following bind rule 
defines the semantics of binding, and is accompanied by a corresponding graphical 
representation of the rule and its application to the Join circuit example: 


bind(P1, P2, Q, D!i) :- 


P1 = C1 [D], P2 = C2[D’], compl(D, D’), 
Q = C1 [D!i] | C2 [D'!i], freshBond(D!i, P1|P2). 
reaction ([P1;P2], "kb",Q) :- bind(P1,P2,Q,_). 
TD] D] 
e= i EEN == 
rp | iD i 
L laces 
tb | b b tx x to Tib P b tx x to 
inpe aemm —— Š eA —| — ——> 
Itb* 1 b* tx*  x* to* tr bt tx  x* tof 


When this rule is applied to the Join circuit example we have the following 
matches, indicated by boxes with dashed lines, which allows binding to take place 
on the toehold tb^: 


P1=C1[tb^], P2=C2[tb^*], Q=C1[tb^!5] | C2[tb**!5] 


The rule specifies that two processes P1 and P2 can bind provided P1 contains 
domain D and P2 contains domain D’ such that D and D’ are complementary, 
written comp1 (D, D’ ). If so, the resulting process Q replaces the matched domains 
with corresponding bound versions D! i and D’ ! i, respectively, and places the two 
processes in parallel, where i does not occur anywhere else in the process, written 
freshBond(D!i, P1|P2).The built-in reaction predicate is used as input 
to the built-in reaction enumeration algorithm, which generates a CRN in a similar 
fashion to the previous algorithm. Here the reaction predicate indicates that the 
reactants are processes P1 and P2, the product is Q and the rate of the reaction is kb. 
Note that in this case the rate of binding is fixed, however the language also allows 
arbitrary functions to be used to compute the rates, including associating the rate to 
the toehold and its surrounding context, such as whether it is at the 3’ or 5’ end of a 
strand [50]. 
Similarly, the following displace rule defines the semantics of strand displace- 
ment: 


displace(P, Q, E!j, D!i) :- 
P= C [E!j D] [D!i] [D’!i E’'!j], 
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Q = C [E!j D!1] [D] [D'!i E'!j]. 
[ 


reaction([P],"kd",Q) :- displace(P,Q,_,_). 


Where the ordering of the holes in the graphical representation is from left to right 
then top to bottom. When this rule is applied to the Join circuit example we have the 
following matches: 


P C [tb*!5 b] [b!4] [b*!4 tb**!5] 
Q = C [tb*!5 b!4] [b] [b*!4 tb**!5] 


This approach essentially replaces the operational semantics described in Sect. 1.1.1 
with an executable logic programming encoding that can be directly edited by the 
user and saved as part of the Visual DSD program. 

More generally, we used this method to encode nucleic acid implementation strate- 
gies that make use of polymerase, exonoclease, and nickase enzymes. Our approach 
greatly simplifies the encoding of enzyme-based systems, by avoiding the need to 
manually encode each enzyme operation for all possible species. We also encoded 
implementation strategies that rely on localization to a substrate and complex nucleic 
acid topologies including branches and pseudoknots [50]. 

The set of compilation rules that can be expressed in this framework is extremely 
broad, as logic programming systems of this kind are known to be Turing-complete. 
This means that future experimental developments in dynamic DNA nanotechnol- 
ogy can be accommodated without rewriting the core of the Logic DSD system, 
simply by writing a new set of rules. In practice, however, expensive computations 
such as detailed biophysical simulations will likely prove impractical if used in rule 
definitions. Furthermore, proof search in logic programming is highly parallelizable, 
making it conceptually straightforward to harness the power of massively parallel 
computational hardware or cloud services to scale up proof search. In this way, the 
current version of the Visual DSD system could serve as a foundation for the design 
of future dynamic DNA nanotechnology systems based on experimental techniques 
that have yet to be conceived. 


2.2 Related Work 


Throughout the time that the Visual DSD system was under development, a num- 
ber of groups were developing computational design tools for DNA circuits. These 
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approaches, and their temporal relationships to each other and to our own work, are 
summarized in Fig. 1. For a more detailed comparison of domain-specific languages 
for DNA circuit design we refer the reader to a review article on the topic [51]. 
Perhaps most notably, the Winfree group developed a range of tools over a num- 
ber of years that have been integrated into automated pipelines for computational 
nucleic acid circuit design. For example, the Piperine design pipeline [68] enables the 
high-level design of CRN-based circuit architectures [72] and their subsequent com- 
pilation into lower-level DNA strand displacement implementations and ultimately 
into nucleotide sequences suitable for experimental testing. 

The Nuskell system [64] is a compiler that converts abstract CRN designs into 
strand displacement implementations using CRN compilation schemes that can be 
specified by the user using a built-in domain-specific language for this task. The 
Nuskell system also incorporates verification capabilities, using approaches based on 
pathway decomposition [62] or bisimulation [63] to formally check that the generated 
strand displacement network is a correct implementation of the original input CRN in 
a rate-independent sense. This required the definition of a semantics for the domain- 
level representation so that a correspondence between domain-level species and the 
abstract species of the input CRN can be rigorously stated and verified. Nuskell 
uses the Peppercorn software [71] to enumerate non-pseudoknotted structures at the 
domain level. Peppercorn uses a set of rules similar in spirit to those of the Visual 
DSD system but expresses them in a pattern-based notation similar to “DU+” used 
by NUPACK [73] for concise specification of secondary structures. Of particular 
interest in Peppercorn is its explicit support for condensing reaction networks into 
simpler versions, in which multiple microstates can be combined into a single “resting 
state” connected only by relatively fast reactions. This is a more flexible approach to 
reaction merging than the system that we initially developed as part of the hierarchy 
of abstractions used in previous versions of the Visual DSD system [19]. 

Other related tools include the Multistrand stochastic simulator that simulates 
DNA nanostructures at the single-base level [67, 74] and the KinDA [70] system 
that predicts kinetics and thermodynamics of domain-level designs after specific 
nucleotide sequences have been assigned to those domains. The Seesaw compiler 
developed by the Qian group [69] enables digital logic circuits to be compiled into 
seesaw gate strand displacement circuits [17] and enables circuits to be implemented 
using even unpurified oligonucleotides. The DyNAMiC Workbench system [66] was 
developed as a web-based tool to target the “port-based” design abstraction developed 
for hairpin assembly systems by Yin et al. [23]. Finally, a range of design tools have 
also been developed specifically for the enzyme-driven PEN toolbox system [41], 
including several that automate various aspects of network design [42, 43]. 

Taken together, these examples illustrate the evolution of computational tools, 
including those based on the Visual DSD system, over this period of rapid growth 
in the capabilities and complexity of molecular computing circuits that could be 
implemented experimentally using dynamic DNA nanostructures. 
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2001 Early domain-level simulations of enzymatic DNA computers [65] 

2009 Abstract CRNs for molecular programming [20] 
Initial DSD language and system [1] 

2011 Visual DSD software tool [6] sstsS—S 
Linear gate polymers [27] 
Seesaw circuit compiler [7] 

2012 JIT compilation [15] 00 
Semantic abstractions and Leaks [19] 
Model checking [52] 

2013 Deterministic simulation and | parameter inference [10] pee 
Satisfiability modulo theories (SMT) solving [54] 

2014 Tethered components to model spatially localised circuits [32] 
Spatial simulation with partial differential equations [58] 
Custom reactions to model DNA and RNA enzymes [44] 
PEN toolbox design software [41] 

2015 Probabilistic model checking with the chemical master equation [60] 
Verification of CRN equivalence [61] 
DyNAMiC Workbench [66] 
Multistrand base-level simulator [67] 
PEN toolbox design automation [43] 


Evolutionary design of PEN toolbox circuits [42] 


2017 Piperine design pipeline [68] 
Nuskell system for compilation and verification of CRNs [64] 
Seesaw compiler enabling unpurified components [69] 

2018 SMT-based analysis of molecular geometry [33] ee 
KinDA tool for domain-level design [70] 

2019 Logic DSDsystem[50]) 
CRN verification via bisimulation [63] 
CRN verification via pathway decomposition [62] 


2020 Peppercorn reaction enumerator [71] 


Fig. 1 Timeline of developments of the Visual DSD system and related work for high-level design 
and analysis of nucleic acid circuits. Red: Evolution of DSD syntax and semantics. Blue: Analysis 
techniques applied to DSD models. Green: Related work 


3 Future 


We conclude with some thoughts on what the future may hold for the computational 
design of nucleic acid circuits, looking ahead to the next 40 years of dynamic DNA 
nanotechnology at the intersection of computational tools, laboratory experiments, 
and biomedical applications. 
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Fig. 2 Potential future developments for computational modeling in dynamic DNA nanotechnol- 
ogy, with some key dependencies highlighted as arrows 


3.1 Computational Tool Integration 


Enhancing the integration of computational tools has the potential to substantially 
improve the design of dynamic DNA nanotechnology systems. For instance, the 
Winfree group and others have integrated multiple tools that span a range of activities 
required to design and implement an experimental system, via the development of 
pipelines such as Nuskell [64] and Piperine [68]. More generally, this highlights 
the power of approaches in which a number of distinct categories of tools, such 
as nanostructure design tools [75], coarse-grained simulation tools [76], sequence- 
level design and analysis tools [70, 73, 77-79], CRN compilers [64, 68], reaction 
enumerators [6, 50, 71], and verification tools [62, 63] can be used in conjunction 
to analyze different aspects of system design. 

Over time, tool integration will likely move from an ecosystem on a local machine 
to a cloud-based framework with interoperable services. The Visual DSD system 
adopted a web-based approach, with the most recent versions taking advantage 
of JavaScript compilation to allow execution directly in a browser on the client 
machine, eliminating the need for server compute infrastructure. While convenient 
for deployment, this limits the user to the power of their equipment. To facilitate 
remote execution on a cluster, a command-line executable of Visual DSD was also 
developed. While this facilitated increased computational performance, it came at 
the cost of usability, where the web-based graphical user interface was replaced 
with a command-line text interface, and integration with computational infrastruc- 
ture needed to be manually configured depending on the infrastructure being used. 
To enable both usability and scalability, we envisage a graphical user interface with 
a built-in option to either run locally or run on the cloud through a simple checkbox, 
unleashing the power of the cloud on demand without compromising ease of use. 
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As more tools continue to move to the cloud, software integration will likely take 
place via web services that scale according to the computational resources required 
(Fig. 2). 

To enable tools from multiple groups to interoperate seamlessly, whether through 
local interfaces or cloud-based web services, common data interchange formats will 
also be needed. For example, the Synthetic Biology Markup Language (SBOL) [80] 
has been developed for the synthetic biology community to share genetic circuit 
designs and is part of a family of interchange formats for systems and synthetic biol- 
ogy [81]. The Systems Biology Markup Language (SBML) [82] provides a common 
interchange format for kinetic models expressed primarily as chemical reaction net- 
works. To date, there are limited interchange formats for dynamic DNA nanotechnol- 
ogy. In future, as DNA nanotechnology and synthetic biology continue to converge, 
there may be an opportunity to extend the SBOL standard, or to create new stan- 
dards based on existing formats such as the dotparen notation for secondary structure 
or Cadnano JSON files for structural DNA nanotechnology designs. In addition to 
ongoing grassroots efforts, existing organizations such as the International Society 
for Nanoscale Science, Computation, and Engineering (ISNSCE) could play a coor- 
dinating role. 

There is also an opportunity to facilitate the development of computational tools 
for DNA nanotechnology through better programming language integration. Recent 
work on the NUPACK system included an application programming interface (API) 
for the Python programming language [77], thereby enabling programmatic access 
to tool features using a general purpose programming language. We have previously 
argued [51] that domain-specific programming languages offer powerful tools for the 
development of design systems for particular application domains, such as dynamic 
DNA nanotechnology. This view is not inconsistent with the desire for an integrated 
toolchain glued together by a general purpose scripting language such as Python. 
In Visual DSD, improved integration could be achieved by adding the ability to call 
out to external functions, for instance to determine rate constants and to enumerate 
reactions based on the output of external solvers such as the Z3 SMT solver [53], 
nucleic acid design tools such as NUPACK [78, 79], or coarse-grained biophysical 
modeling tools such as oxDNA [76]. From a language implementation perspective, 
this would require a foreign function interface (FFI) to retrieve the results of calls to 
external tools: this could build on previous work on functional logic programming 
systems [83] that integrate logic programming-style proof search with function calls. 
From a computational feasibility perspective, oxDNA simulations are often time 
consuming and can sometimes take several days, so relationships would most likely 
need to be determined first by oxDNA or experimentally and then encoded in the 
logic programming framework. 

Finally, the potential for machine learning to be applied to biodesign has been 
widely acknowledged [84], including for example the opportunity to use machine 
learning tools to predict DNA interactions from sequences[85, 86], using both 
feature-based and recurrent neural network-based approaches, and incorporating 
domain knowledge into the models. The significant computational power of modern 
GPUs could also be harnessed for computational design and simulation of DNA 
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nanotechnology systems, as demonstrated by recent work applying differentiable 
programming to design gene regulatory networks [87]. As mentioned above, cloud 
computing resources could be harnessed for computational design by exploiting the 
highly parallelizable nature of proof search in logic programming systems such as 
Logic DSD. 


3.2 Experiment Integration 


Experimental techniques in DNA nanotechnology have continued to rapidly advance 
in the decade since Qian and Winfree reported their seminal “four-bit square root” 
strand displacement circuit [7]. As things stand, sequence design tools such as 
NUPACK [78, 79] are almost indispensable for designing such circuits. As circuit 
complexity continues to increase, computational design tools at the domain level will 
become increasingly valuable for debugging failure modes and tuning system dynam- 
ics. When it comes to analyzing error modes of these systems, such as unintended 
leak interactions between DNA strands, already we are reaching the limits of what 
is possible computationally, even for very simple circuits. For instance, leak analysis 
in Visual DSD can generate thousands of unintended reactions even for very simple 
circuits, due to the combinatorial explosion of variant structures produced via leaks, 
which can themselves undergo further leak reactions, recursively. Recent promis- 
ing developments have included the design of circuits that resist leak [88], which 
could facilitate the implementation of substantially larger circuits. More generally, 
the need for higher-level computational design tools will likely only grow as circuit 
complexity increases due to further improvements in experimental techniques. 

Furthermore, recent developments in areas such as enzyme-driven DNA circuits 
[40] hint at the future convergence of DNA nanotechnology with synthetic biology, 
which will likely open new areas for modeling the interactions of DNA-based molec- 
ular devices with biological and biochemical components, including in living cells. 
In terms of keeping pace with such developments, one goal of the newest Logic 
DSD system [50] is to provide headroom by adopting a Turing-powerful language 
in which to express the semantic rules used for compilation of structural models 
into kinetic ones. In principle, end-users can reprogram the behavior of this DSD 
compiler to implement the specific assumptions of their experimental system, which 
is embodied by the user-supplied rules rather than hard-coded into the semantics of 
the DSD compiler itself. 

In the longer term, there is a need to not only integrate the different modeling 
methodologies but also to track the evolution of knowledge over time, in the form of 
computational models. Previous work has investigated training models of nucleic acid 
circuits through parameter inference methodologies [10, 30]. In future we envisage 
an approach where computational models are at the center of the Design-Build-Test- 
Learn (DBTL) cycle, and continually updated over time as new experiments are 
performed. 
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We also note the substantial potential for the integration of new and future experi- 
mental techniques for high-throughput testing and readout of DNA-based molecular 
circuits. Most notably, high-throughput sequencing methods and nanopore sequenc- 
ing may enable large-scale interrogation of the dynamics of DNA circuits, which has 
previously been limited to the observation of a relatively small number of signals 
due to spectral overlap issues inherent in the use of fluorescent reporters. Similar 
techniques could be used for larger-scale data collection, enabling more data to be 
gathered on the internal states of DNA circuits during execution. 

Future directions in experimental DNA nanotechnology could facilitate sys- 
tem design by improving the correspondence between computational modeling and 
experimental reality. For instance, a key issue in computational modeling of exper- 
imental systems is ensuring that the initial conditions of the experiment match the 
initial conditions of the simulation. In previous work, this has been achieved in a 
number of ways, including explicitly modeling some fraction of inactive gates to 
account for incorrectly assembled gates [10] or by adjusting the initial concentra- 
tions of components in the experimental system to make the fraction of active gates 
match that required by the circuit design [69]. Perhaps a future research challenge 
in experimental DNA computing could be to aim toward “Angstrom resolution” in 
dynamic DNA nanotechnology, similar to the goal set for the field of structural DNA 
nanotechnology [89]. This might be a more challenging task as it would involve not 
just precision in placement of atoms in the initial state of the system but also in 
the programming of the subsequent pathways taken when the system computes over 
time. A possible initial goal would be to reliably construct just the initial state of 
a system, in terms of filtering out incorrectly synthesized or misfolded structures 
at the single-molecule level. At present, this is largely done by annealing followed 
by manual purification via polyacrylamide gel electrophoresis (PAGE). However, 
this does not guarantee that every gate in the sample will necessarily be correctly 
formed, opening up the possibility of errors. Future advances in single-molecule 
analysis, such as imaging or nanopore sequencing, could enable structures the size 
of currently-used DNA strand displacement gates to be analyzed, and perhaps even 
sorted, to produce highly pure samples for use in molecular computing reactions that 
can more faithfully reproduce the starting conditions assumed in the models used for 
computational design. Alternatively, high-fidelity biological synthesis of both DNA 
[10] and RNA [90] strand displacement components could improve the purity of 
experimental systems. 


3.3 Computational Design for Practical Applications 


A key challenge for molecular computing as a field is to successfully demonstrate 
practical applications for the technologies that have been developed over recent 
years. One possibility here is the use of dynamic DNA nanotechnology within living 
organisms to sense and control biological networks [91, 92]. These could be used 
for cell labeling [93], for diagnostic applications [94], for therapeutic effects (e.g., 


340 M. R. Lakin et al. 


by silencing specific genes) [95], or even to build theranostic (therapeutic and diag- 
nostic) systems that autonomously detect and treat diseases. Such applications in 
nanomedicine would exploit the innate biocompatibility of DNA nanotechnology to 
carry out computation within living cells, where traditional silicon-based micropro- 
cessors cannot operate. 

Previous work on implementing strand displacement reactions in living cells [92, 
96, 97], including more recently using heterochiral DNA nanotechnology [98, 99], 
brings this possibility closer. However, even with chemical modifications added to 
the DNA, the interactions of DNA-based molecular computing systems with living 
cells are complex. This highlights a potentially fruitful avenue of future work in 
the modeling and design of computational nucleic acid systems: to interface nucleic 
acid circuit models with whole-cell models of cellular processes [100], including 
predicting, and designing against, degradation of circuit components by nuclease 
enzymes and other forms of interference. 

Finally, over the next 40 years we anticipate that the fields of DNA nanotechnology 
and synthetic biology will continue to converge. In particular, CRISPR/Cas systems 
for RNA-guided gene editing [101] and trancriptional regulation [102] have recently 
been integrated with strand displacement reactions for programmable control of 
CRISPR targeting [103-106]. Similar approaches have used strand displacement 
reactions to regulate translation via RNA-based “toehold switches” [107, 108] and 
transcription via small transcription-activating RNAs (STARs) [109]. More gener- 
ally, the control of biological systems via “output interfaces” such as the CRISPR/Cas 
system is likely to be a substantial growth area for applications of dynamic DNA 
(and RNA) nanotechnology. We anticipate further opportunities to apply rule-based 
modeling tools such as Visual DSD to hybrid systems that span both fields. We 
also anticipate that integrating computational design tools for DNA circuits with 
tools specific to the application domain will be critical, paving the way for new and 
exciting applications of DNA nanotechnology over the next 40 years. 
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Abstract Molecular programs use chemical reactions as primitives to process 
information. An interesting property of many of these amorphous systems is their 
scale-invariant property: They can be split into sub-parts without affecting their 
function. In combination with emerging techniques to compartmentalize and manip- 
ulate extremely small volumes of liquid, this opens a route to parallel molecular 
computations involving possibly millions to billions of individual processors. In 
this short perspective, we use selected examples from the DNA-based molecular 
programming literature to discuss some of the technical aspects associated with 
distributing chemical computations in spatially defined microscopic sub-units. We 
also present some future directions to leverage the potential of parallel molecular 
networks in applications. 


1 Harnessing Parallelization in Chemical Reaction 
Networks 


1.1 D(R)NA-Based Deterministic Chemical Reaction 
Networks 


D(R)NA-based artificial reaction networks are man-made, rationally designed 
biochemical systems targeting specific dynamical or computational behaviors. They 
use DNA or RNA as a substrate, as nucleic acids have proved incredibly versatile 
for storing information and executing chemical computations. Nucleic acids can be 
commercially synthetized with user-designed sequences, they can be tracked and 
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characterized with great details (with fluorescence, sequencing, mass spectrometry, 
electrophoresis...) and can be conjugated to a large variety of other molecules. 

These systems are based on modules (nodes or gates) that one can connect at 
will to create more-or-less arbitrary circuit architectures. This approach leverages 
the canonical programmability of DNA/DNA interactions—Watson-Crick rules— 
and the existence of generic reactions, which are agnostic to sequence. D(R)NA- 
based molecular programs take molecular signals as input, for example, the pres- 
ence/absence of particular species or the concentration of these species, and process 
through a cascade of reactions—in many cases including feedbacks and multiple 
intermediates—down to the production of particular output species. The concentra- 
tion of these output species, which can be monitored via physicochemical signals 
such as fluorescence, is then interpreted as the result of a computation. The computed 
function is encoded by the details of the reaction circuit, including its topology and the 
kinetic laws and parameters of the constituting reactions. A number of experimental 
toolboxes have been developed to experimentally assemble these architectures [1—6]. 
In principle, using these frameworks, one can basically set up a molecular system 
performing any computation of interest [7]. 

The fundamental building block of all reaction networking schemes is catalysis 
[8-10]. Catalysis allows one compound of the network to act on others, without 
being itself included in the transformation. For nucleic acid (NA)-based systems, 
catalysis is either related to mechanisms that can accelerate the slow reaction of 
strand exchange [4, 8, 9, 11] or to enzymatic manipulation of NA synthesis and 
degradation (e.g., polymerases, ligases and nucleases) [1, 2, 6]. 

In the case of enzymatic networking schemes, for example, genelets [1] use an 
RNA polymerase and one or two RNases, while the PEN toolbox [3, 6, 12] uses a 
DNA polymerase, a nickase and a DNA exonuclease. These enzymes can be seen 
as the hardware of a machine whose behavior is controlled by a DNA software 
consisting of synthetic templates. Although less explored than DNA-only systems, 
enzymatic approaches provide benefits in terms of compacity, ease of preparation, 
amplification folds and control of leaks. Importantly for the topic discussed here, 
some of them provide both a well-controlled kinetic bottleneck on the fuel consump- 
tion and an efficient chemical sink and are thus able to maintain an out-of-equilibrium 
state of long periods in a closed reactor. Demonstrated out-of-equilibrium DNA- 
encoded enzymatic networks include oscillators [3, 13, 14], inverters, memories 
and toggle switches [1, 12, 15]. Note that other NA-based molecular programming 
paradigms have been described and also experimentally validated [1, 2, 4, 7-9]. 

Chemical reaction networks (CRNs) share many formal similarities with other 
computational frameworks, such as electronic or mechanical circuits. They are 
modular, and computations are built by linking together simple computing modules 
(e.g., transistors in the case of electronic circuits and chemical reactions in the case of 
chemical reaction networks). However, contrary to these more classic approaches, 
CRNs run in a homogenous, liquid medium. In the general case, CRN-encoding 
compounds freely diffuse within this medium. Because of this, individual network 
connections within a CRN are not defined by spatial addresses, but rather by chem- 
ical recognition rules. A link between A and B in the network means a wire between 


Parallel Computations with DNA-Encoded Chemical Reaction Networks 351 


the location of a single copy of A and a single copy of B for an electronic circuit, but, 
for a CRN, it means a specific (strong) chemical interaction between the molecules 
of type A and molecules of type B, both present in multiple, independent copies. 
The standard DNA program runs in a test tube, that is, a typical volume of 10 s— 
100 s of microliters. One tube performs one—possibly complex—computation (a 
computation being defined here as a series of chemical reactions producing output 
species that depend on the input species). The computation is generally done only 
once, as in most cases resetting is not possible (i.e., regenerating the inputs if they 
have been consumed, erasing the outputs and restoring the intermediates species 
that drove the reaction). CRNs operate on molecular concentrations, in the sense 
that data flows are instantiated by changes in the concentration of some species 
over time, and the output is typically represented by the concentration of one or 
more specific species. The concentration of most compounds participating in a CRN 
needs to be in the nanomolar range or above to get appreciable kinetics, because most 
programmable reactions, including DNA hybridization [16], are limited by diffusion 
below this value. These typical volumes and concentrations imply that molecular 
counts are extremely high, typically 10°—10'> copies of each specific molecular type. 
The fact that computation by a bulk CRN involves the production of so many identical 
molecules naturally leads to consider the potential for parallel implementations. 


1.2 CRNs Run on Inherently Parallel Processes 


CRNs controlled by mass-action kinetics (see below for stochastic effects in very 
small volumes) are scale-free. Their behavior does not depend on the size of the 
system. This is because concentrations are intensive thermodynamic variables and 
leads to an intriguing property: the dynamical or information-processing functions 
of CRNs are independent of their size. For example, if the experimenter partitions a 
circuit from one large to two smaller containers, he will simply obtain two systems 
that are dynamically and computationally identical to the parent reaction. In other 
words, the function has not changed, but the number of threads has been doubled. Let 
us assume as an example that the CRN was running an addressable bistable system 
[12, 17], in order to maintain one bit of memory storage. The splitting in that case 
will simply lead to having two independent copies of the same memory element. Post 
splitting, one of them can in principle be reset to a different state. Altogether, one 
now disposes of two bits of memory. Given enough compartments, this operation 
can be repeated over and over. 

Using microcompartmentalization techniques, where individual reactors’ 
volumes range from femtoliters to nanoliters, and using the above-mentioned test 
tube as starting material, it is possible to generate millions to billions of compart- 
ments. For example, the splitting of a bulk DNA-encoded enzymatic predator-prey 
oscillator into ~100 picoliter microdroplets, using a microfluidic approach, imme- 
diately generated many millions of independent networks, all displaying the same 
oscillatory behavior [18, 19]. 
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This scale-freeness of CRNs is only true down to the stochastic limit, where 
volumes get so small that molecular counts are close to unity (Fig. 1). In this range, the 
continuous approximation breaks for concentrations and kinetics, which cannot be 
seen as deterministic anymore. Rather, they become discrete and stochastic: molec- 
ular counts are randomly updated by integer increments—which fundamentally alters 
the behaviors of CRN. For instance, in the deterministic regime and with conventional 
chemistry, it is not possible for a chemical species to reach its null-state in finite time 
(i.e., its concentration cannot get to 0). This implies that an autocatalytic system can 
always “regenerate” itself after dilution, because there are always some molecules 
around to seed the reaction. Yet in the stochastic regime, a species can permanently 
“crash” to zero with a non-null probability. Keeping the nanomolar concentration 
as a reference, the stochastic limit occurs around femtoliter volumes (the volume 
of a typical bacterium). Therefore, depending on compartment size, the splitting of 
molecular programs can either provide parallel circuits with classical behaviors or 
enable new, non-deterministic operations [20, 21]. 

To truly leverage the potential of CRNs for parallelism, one ideally wants to 
perform different sub-computations at each position of the distributed system. The 
way these sub-computations interact with each other through space (by free diffusion, 
through restricted diffusion, or not at all) will then define the various options for 
parallel molecular programming. 

In this short and non-exhaustive contribution, we review experimental works 
that have explored the parallelization of DNA-encoded mass-action CRNs, focusing 
in particular on microcompartmentalization and high-throughput strategies. Based 
on these selected experimental approaches, we also suggest and discuss possible 
applications and future directions for parallelized molecular computing. 
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Fig. 1 Splitting a sample containing a molecule at concentration C = 1 nM produces multiple 


compartments with the same average concentration and a coefficient of variation CV = A , where 
X is the average number of molecules per compartment. When the compartment volume reaches 
down to the femtoliter \~CV ~ 1, concentrations take discrete values and stochastic partitioning 
leads to strong inhomogeneity between compartments 
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2 Creating Sub-Computations 


A variety of approaches are suitable for the parallelization of molecular programs. 
Below, we group them into approaches that use impermeable physical boundaries 
(e.g., water-in-oil droplets), permeable boundaries or no boundaries at all (Fig. 2). 
Important characteristics associated with these technical choices are the size of the 
individual elements and the associated throughput (the number of compartments one 
can generate or manipulate in a standard experiment). 


2.1 No-Diffusion (Leak—-Tight) Compartments 


A number of tight microcompartmentalization approaches have been explored. The 
absence of transport across the boundaries ensures that each microcompartment 
behaves as a fully independent computing element. On the other hand, it makes 
the delivery of specific inputs to each individual reactor more problematic. In the 
case of dynamical systems (out-of-equilibrium CRNs), it also mandates that the 
encapsulated chemistry be autonomous. For example, although many chemical (non- 
DNA) oscillators are known, most require continuous influx and outflux of reactants 
in open reactors, and only a handful are able to display (pseudo-) limit cycles in batch 
mode. Fortunately, several autonomous DNA circuits with non-trivial dynamics have 
been reported over the years and provide useful test cases for compartmentalization 
approaches (Fig. 3). 
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Fig. 2 Parallel molecular computations can be classified according to the diffusion rules between 
sub-units. a in the absence of boundaries, the behavior is governed by reaction—diffusion processes. 
Competition between reaction and diffusion determines the elementary length scale. b semi- 
permeable compartments, where the diffusion of molecular program components between partitions 
is permitted or not depending on the permeability rules (for example, size cutoff), can allow the 
emergence of new collective functions from simple sub-units. c impermeable compartments, where 
no diffusion is allowed between compartments, open the way to pure parallel processing 
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Fig. 3 Experimental approaches for independent compartments. a water-in-oil emulsions 
containing various DNA-encoded oscillatory networks, including genelets [22] and PEN-DNA 
circuits [18]. b microfabricated silicon chambers encapsulate IVTT-based functions, including 
oscillators [23]. c liposomes, used in this example as vehicles for IVTT-replication (IVTTR) gene 
amplification system [24] 


2.1.1 Emulsions 


Emulsions are made of a continuous and organic phase that contains many small 
water-in-oil droplets that can be considered as chemically independent (see [25, 
26] for discussion of droplet-to-droplet exchange in some specific cases). Micro- 
partitioning a master mix using emulsions can be as simple as combining the aqueous 
reaction mix with oil and surfactant, using for example vortexing. Simmel, Winfree 
and colleagues used this approach to parallelize a genelet oscillator and reported 
the observation of microscopic droplets showing individual oscillatory behavior [22] 
(Fig. 3a). While many other emulsification techniques are possible, including the use 
of filters, mechanical mixers, bead shakers, sonication, etc., it is not yet clear how the 
harsh conditions they entail impact the macromolecules of the molecular programs. 
In addition, although they are very fast—with millions or billions of compartments 
created in seconds—these techniques do not provide precise control over the size 
distribution. 

An alternative is droplet microfluidics [27]—in which reagents are controllably 
mixed and encapsulated at the microscale—which offers a better control on the size 
distribution and a gentler emulsification process. Although the compartments are 
produced in a sequential manner, generation rates reaching 10 s or 100 s of kHz can 
be achieved, and parallel designs [28] with tens of thousands of nozzles have been 
reported [29]. The use of microfluidic emulsions is well developed in the field of high- 
throughput directed evolution, combinatorial chemistry or droplet digital Polymerase 
Chain Reaction (ddPCR) [30]. In the latter case, a number of commercial solutions 
are available. 


Parallel Computations with DNA-Encoded Chemical Reaction Networks 355 
2.1.2 Liposomes 


Liposomes more closely mimic the architecture of biological cells. While droplets are 
made of an aqueous compartment dispersed in oil, liposomes are aqueous compart- 
ments surrounded by a lipid membrane (a phospholipid bilayer) and dispersed in an 
aqueous phase. There are thus two aqueous phases: one inside each liposome and one 
outside the liposome. Since unmodified lipid bilayers are typically impermeant to 
DNA strands or large proteins, liposomes present in principle an attractive alternative 
to droplets for microcompartmentalization of molecular programs. Their generation 
is however more technical and delicate than droplets, especially when the size distri- 
bution of liposomes must be controlled [31]. This may explain why liposomes have 
been less explored for computational CRN [32], although there are some examples 
of liposome-compartmentalized complex biomolecular mixtures, generally designed 
as minimal artificial cells [33]. The group of Vincent Noireaux, among others, encap- 
sulated In Vitro Transcription and Translation systems (IVTT) based on cell extract 
to recapitulate protein expression [34]. Danelon et al. went a step further by showing 
that genetic replication, supported by in situ expression of genetically encoded enzy- 
matic activities, can be installed within IVTT-containing liposomes [24] (Fig. 3c). 
In principle, such minimal cells reproduce the basic functions necessary to enter a 
process of Darwinian evolution. 


2.1.3 Microchambers 


Microfabrication offers a top-down way to create microcompartments with precisely 
defined geometries, within which very small volumes of liquid can be enclosed [35]. 
This can be used for example to create partially confined reactors, able to support 
localized behaviors while still diffusively exchanging with a feeding tank. Bar-Ziv 
and colleagues leveraged this approach to carve microchambers and connecting chan- 
nels inside a silicon wafer [23] (Fig. 3b). The chambers are patterned at their bottom 
with DNA brushes, which are dense monolayers of DNA strands’ coding for a gene 
expression CRN with positive or negative feedbacks. Each chamber is connected to 
a central feed channel that runs through the chip, acting as a chemical source and 
sink. The feed channel baths the chambers in a cell-free extract, which contains the 
molecular machinery to transcribe and translate the gene encoded by the DNA brush. 
This reaction is predominantly local, but species diffuse and escape over time to 
compartments nearby—creating a spatial network of connected compartments. The 
authors showed that the geometry of the channels and chamber controls the inter- 
play between reaction and diffusion. For instance, a positive feedback loop network 
was unable to kick off when the channel connecting the chamber to the feed was 
too short. This was expected because the removal of signaling molecules dominated 
over their autocatalytic production. When the connecting channel was made longer, 
thereby decreasing the effective dilution rate, the positive feedback loop ignited 
and the chamber glowed green. The author extended this approach to create small 
spatial networks of microreactors, where each microreactor could contain a different 
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set of encoding DNA modules (here, gene-encoding templates) [36]. In principle, 
this approach could allow the design of a variety of exotic dynamical behaviors, by 
controlling the relative position of each DNA node [37]. 

Note that some hybrid techniques, using both top-down micro-patterning and oil— 
water interfaces, can be used to create regular arrays containing millions of extremely 
small and monodisperse compartments (down to the femtoliter), compatible with 
biomolecular reactions [38]. 


2.2 Compartment-Free Approaches 


Spatially resolved CRN behaviors can also be observed without any physical bound- 
aries. This is well known in the community of nonlinear chemistry, which has studied 
a number of beautiful reaction—diffusion systems [39, 40]. In another field, chem- 
ical engineers are also interested by the emergence of localized behaviors, which in 
some cases may have deleterious consequences. In the absence of physical bound- 
aries, whether or not local behaviors are possibly depends on the relative strength of 
chemical reactions and diffusion. This simple statement is captured by the so-called 
Damkohler number [41] which compares transport and reactive rates: 


Da = reaction rate/transport rate. 


Localized behaviors are possible when this dimensionless number is greater than 
1,i.e., when reactions are faster than mixing. For diffusion-limited reactions involving 
short oligos, the hybridization rate (which is mostly independent of sequence, length 
and temperature) is in the order of 10° M~! s~!, whereas diffusion is around D~10- 
100 um? s7! (single- or double-stranded 10—100 mers) [42]. Knowing the concen- 
trations of gate or node molecules, one can then assess the spatial scale of localized 
behaviors, in the absence of any mixing. For example, if the reaction rate is around 
1 min“, this length scale is 10 um; smaller reactors can be considered spontaneously 
well-mixed, whereas larger ones may generate reaction—diffusion systems (Fig. 4). 

Similar to the case of well-mixed systems, the unfolding of a reaction—diffusion 
process can be interpreted as a computation, whose input will typically include 
geometrical clues. Biology provides a number of examples, such as the creation 
of concentration patterns during embryo morphogenesis via clock-and-wavefront, 
French-flag or Turing mechanisms, which will later on trigger cellular differentiation. 
At a simpler scale, the bacterial Min system performs an accurate determination of 
the geometric middle of a rod-shaped container using reaction—diffusion processes 
[43]. For molecular programming in reaction—diffusion settings (i.e., high DamkGhler 
numbers), one needs to control, in addition to topology laws and rates, the diffusion 
characteristic of each species, as well as the boundary conditions. 

For example, Padirac et al. spatialized a predator-prey oscillator by enclosing 
the corresponding DNA-encoded circuit in a flat chamber 200 um thick and one 
centimeter wide, installed on a microscope stage. This chamber was locally seeded 
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Fig. 4 Boundary-free spatial computations (reaction-diffusion systems). An initial state, taken as 
input, is transformed into dynamic or stationary patterns. a a spot of trigger molecules (orange spot) 
initiates the propagation of successive waves of species a (prey) and B (predator) in a predator— 
prey molecular system [44]. b a morphogen gradient (purple) is interpreted by a tristable system to 
create a three-stripe pattern, recapitulating the “French-Flag” embryogenesis [45]. ¢ an incoherent 
feedforward loop coupled to differential diffusion coefficients allowed the transformation of an 
input pattern injected through UV-sensitive DNA modifications [46] 


with prey, which created inhomogeneous initial conditions, and fluorescent reporters 
were used to observe the concentrations of both dynamic species. Incubation revealed 
traveling waves of prey, chased by waves of predators—reminiscent of their ecolog- 
ical counterparts (Fig. 4a). The waves traveled at a velocity of hundreds of microm- 
eters per minute. These chemical waves result from the interplay between reac- 
tion and diffusion, and they transport information (e.g., the concentration of chem- 
ical species) over large distances (in the same way as light or electrical impulses 
could). Their velocity can be roughly predicted from simple scaling arguments 
v = J/(D/t) =~ 100 pm-min!. 

Similar PEN-toolbox traveling waves and fronts provide a useful primitive to build 
more complex spatial operations. Triggered at the entrance of a millimetric maze built 
of PDMS, a wavefront followed the walls of the channels, split into two at a junction 
and died when it reached a cul-de-sac [47]. The shape of the wavefront was sensitive 
to geometric clues such as the curvature of the channels. When the wavefront entered 
a turn with a short curvature radius, it showed a transient dispersion where the front 
on the outer end of the turn progressed faster than the front on the inner end. This 
dispersion was not observed in turns with larger radii. 

Biological embryo development provides inspirational examples of how complex 
geometrical shapes can unfold from chemically encoded programs. The synthetic 
version, artificial morphogenesis thus represents an exciting frontier for molec- 
ular programming. The challenge is ambitious, because, in addition to topology, 
rates and laws, artificial morphogenesis requires the control of diffusion rates in a 
species-specific way. For example, Turing patterns generally require a marked differ- 
ence between the diffusion rate of activating and inhibiting compounds; in the field 
of small-molecule CRN, a strategy to slow down the diffusion rate of inhibitory 
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compounds has been instrumental to the engineering of the first synthetic Turing 
motifs [39]. For DNA-based approaches, the group of Estevez-Torres and Galas 
in Sorbonne Université/CNRS introduced some strategies for this purpose, based 
on immobilized binding (complementary) partners of the active DNA species [48]. 
They also studied the interaction between traveling fronts or confined them within 
guiding gels [49]. These efforts, reviewed in [50], culminated in an artificial recon- 
stitution of the famous “French-Flag” embryo patterning algorithm, a system that 
takes a shallow 1D chemical gradient as input and outputs three sharply demarcated 
spatial regions (Fig. 4b) [45]. 

The diffusional coupling between reacting voxels (i.e., elementary reaction 
volumes) in reaction—diffusion systems can also be leveraged to compute fine spatial 
transformations (i.e., the transformation of a spatial pattern of concentrations into 
another pattern). Ellington and colleagues created a strand-displacement network 
that takes a 2D illumination pattern as input and outputs a species that delineates 
the edges of the pattern. This system is built around an incoherent feedforward loop 
(Fig. 4c) [46]; light both activates and inhibits the creation of a signal species. In area 
without illumination, nothing happens. The same is true for areas of homogenous 
illumination, where chemical activation and inhibition cancel each other. But, on the 
edges of an illuminated area, species that were activated by light diffuse to nearby 
dark area and escape photoinhibition, thus revealing edges. In principle, this approach 
could be generalized to more complex image processing and be connected to other 
chemical (like polymerization) or biological (like cell culture and differentiation) 
processes. This could enable wholly new ways to orchestrate complex chemistry or 
biology using spatially structured illumination to coordinate and manage complex 
processes. 


2.3 Intermediate Cases: Some Species Diffuse, Some Do Not 


In this section, we group approaches that are inbetween the two extreme cases 
presented above: completely tight compartments on the one hand and unrestricted 
diffusion on the other hand. Indeed, it is possible to imagine that the experimental 
design does not restrict diffusion of all species (Fig. 5). Although this situation may 
bring some issues (compartment content leaks out), it can also offer a number of 
advantages. As seen in the silicon microchambers example above, partial enclosing 
provides a strategy to deliver fuel molecules and implement a physical first-order 
decay process. This is very useful for systems that cannot sustain dynamical behavior 
on the long run in closed containers and is a strategy that has also been extensively 
used in the field of small-molecule nonlinear chemistry [39]. In addition, controlled 
leakage enables compartment-to-compartment communication and coupling, as well 
as the delivery of input signals. 
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Fig. 5 Strategies to create compartments with selective boundaries. a proteinosomes form compart- 
ments with size-selective boundaries. While streptavidin-tethered DNA species cannot cross the 
membrane, other DNA strands freely diffuse between compartments [51]. b a similar approach 
builds on microchambers with alginate walls, impermeable to polyacrylamide-grafted DNA strands 
[52]. c the attachment of DNA species directly to an immobile support (here, gel beads) prevents 
their diffusion, while input and output species can freely diffuse between the beads [41]. d lipid 
bilayers allow the passive diffusion of hydrophobic compounds while confining biomolecules [26]. 
e protein pores (e.g., a-hemolysin) can be inserted in the membrane allowing for the selective 
diffusion of molecules (e.g., arabinose) [26] 


2.3.1 Proteinosomes 


One approach uses semipermeable membranes that filter molecular species based on 
their sizes. Proteinosomes, for example, are artificial semipermeable microenclosure 
based on a layer of crosslinked proteins. They can be generated in a highly monodis- 
perse way, and their membrane cutoff can be adjusted through the cross-link density. 
De Greef et al. created spatial arrays of proteinosomes containing non-enzymatic 
DNA networks (Fig. 5a) [51]. The compartmentalized circuit modules were trapped 
within the proteinosome, using affinity tagging to bulky partners, but small DNA 
signals could diffuse across the boundary, thus opening a communication channel. 
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2.3.2 Hydrogel Barriers 


Hosoya et al. reported a different strategy to selectively permit signal transmission 
across different compartments, which uses the differential diffusion coefficients of 
DNA molecules across a gel matrix (Fig. 5b) [52]. Using an aluminum mold, they 
manufactured an array of millimetric-sized compartments separated by 500-um thick 
alginate gel walls. While input/output single-stranded DNA can freely diffuse across 
the walls, DNA gates are grafted to polyacrylamide anchors that prevent their diffu- 
sion from compartment to compartment. By filling different compartments with 
different molecules (input, gates or catalyst), the authors created a primitive cellular 
automaton—termed gellular automaton—that forms specific patterns according to 
the position and type of the initialized cells. The work also showed that, in absence 
of an active signal amplification system (i.e., a positive feedback loop), diffusion is 
a poor mechanism to transmit information over large distances. 


2.3.3 Liposomes and Lipid Bilayers 


Although liposomes’ boundaries are impermeable to many hydrophilic or charged 
compounds, it is possible to engineer pores with soluble proteins that spontaneously 
insert in the bilayer—many of these proteins are toxins that bacteria use to vamp- 
irize target cells. Protein pores have an inner channel whose dimensions are well- 
defined and can be extremely selective in the size and type of molecules that they 
allow through the membrane. Some pores are controllable through external signals 
such as light [53]. Insertion of w-hemolysin pores in liposomes creates semi-opened 
liposomal compartments that exchange fuel and waste with the environment [54], 
extending the lifetime of the enclosed molecular network (Fig. 5e). In another study, 
the same pore was used to create a self-selection feedback for directed evolution 
[55]. However, a-hemolysin’s small size limits its use to transport DNA signals. 
When brought in close contact, phospholipid-stabilized droplets form bilayers, 
known as Droplet Interface Bilayer (DIB), in which membrane pores can be inserted. 
Dupin et al. followed this approach to spatially array dozens of microdroplets 
connected again by a-hemolysin [26]. Since the inner channel of a-hemolysin is 
narrow, the study relied on small chemicals rather than DNA as signal molecules. 
The authors thus created interfacing modules to connect these signals to IVTT- or 
strand displacement-based circuits. In the case of IVTT, they showed that the protein 
pore itself can be one of the outputs of the circuit, allowing feedbacks that build 
on the reinforcement of the communication between two compartments. Pore inser- 
tion is a random phenomenon with a potential for strong signal amplification (given 
that an important chemical gradient exists across the membrane), and this led to the 
observation of the stochastic differentiation of initially identical microcompartments. 
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2.3.4 On-Support Immobilization of DNA Species 


Since DNA-based networks always involve multicomponent reactions, an alternative 
to building (semi) permeable enclosures is to attach one (or more, but not all) of the 
reactive species to an immobile support [56]. This is an attractive option because 
many strategies and commercial modifications are available to graft DNA to various 
supports, and these chemical steps have been intensively studied for applications such 
as microarray and, more recently, Next-Generation Sequencing (NGS). In addition, 
a number of DNA amplification schemes or DNA reactions compatible with a solid 
support are available, including bridge PCR, primer walking reactions [56], primer 
grafted Recombinase Polymerase Amplification (RPA) or other enzymatic reactions 
[57]. 

This option was explored in [41], where porous microbeads carrying DNA instruc- 
tions encoding autocatalytic CRNs were dispersed in a continuous layer of PEN- 
toolbox mixture (i.e., the dNTP and enzymes necessary to run the DNA instructions, 
Fig. 5c). In the simplest case, sharp and sustained gradients of concentrations were 
established around the active microbeads. In addition, various bead types, carrying 
different instruction sets, could be combined to create large scale collective behaviors. 
Precise mathematical formulas were derived to rationalize the observation of local- 
ized PEN out-of-equilibrium behaviors (depending on whether the geometry was 
2D, 2.5D or 3D). A key insight from this study was that the kinetics of the reactions 
involved in the CRN impose a critical length scale for the emergence of localized 
behaviors in the absence of physical boundaries. In particular, DNA instructions- 
carrying beads have to be large enough to be able to maintain an individuality; if the 
bead is too small or if the chemistry is too slow compared to diffusion, then all beads 
perform identically, and the potential for parallelism is lost. 


3 Discussion and Applications 


Mass-action kinetic systems can be split without altering their dynamical properties, 
and this property extends to computational chemical networks. The insensitivity to 
scale is generally true down to microscopic length scales, offering the possibility for 
massive parallelism using the same amount of material as used for a single “classic” 
experiment (e.g., micro- to milliliters). 

This property of CRNs is relatively unique in computational systems. It is alien to 
standard silicon-based hardware and in general from all computation devices based 
on physical processes (e.g., mechanical computers). However, being split-resilient is 
a property observed in many biological systems, not only unicellular, but also higher 
organisms including many plants, fungi and some animals: for these organisms, a 
splitting event creates two viable individuals from one, with properties identical to 
each other and to the parent. In many cases, the two parts will grow back to the size 
and shape (intensive variables) of their parent or at least have the potential to do so, 
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Fig. 6 Applications of parallelized molecular computations. a independent compartments 
containing an identical circuit but receiving different inputs. b independent compartments containing 
different circuits (e.g., from a parametric family) all working on the same inputs. ¢ cross-talking 
compartments collaborating to compute a global response. d examples of input types 


if the conditions are favorable. In the animalia kingdom, the hydra is the classical 
example. 

Beside the biological analogy, we believe this unique asset ought to be better 
explored and exploited in the future of artificial molecular computations. Depending 
on whether the sub-compartments are independent or not, we see three interesting 
directions to harness parallel CRNs (Fig. 6). 


i. Independent compartments containing an identical circuit but receiving different 
inputs 
ii. Independent compartments containing different circuits, all working on the same 
inputs 
iii. Cross-talking compartments collaborating to compute a global response. 


3.1 Independent Compartments Containing an Identical 
Circuit but Receiving Different Inputs 


In the simplest case, split resilience can be used to repeatedly perform the same 
operation on different inputs. Bacteria, for example, can be seen as vehicles of their 
genomes, each testing the reproductive success of a slightly different genome variant 
or testing the same variant under different external forcings. 

The same idea can be used for microcompartmentalized artificial molecular 
programs. This is because microcompartmentalization itself is a way to diversify 
inputs. For a given reactor of volume V, as soon as input concentration gets close 
to the critical value Ce = (VN)~! (where N is Avogadro’s number), stochastic parti- 
tioning ensures that each compartment receives a different count of inputs. In the case 
where [inputs] << C,, only two types of compartments are created, with or without 
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one molecule of input. This latter case is of particular interest if input molecules 
come from a library: a collection of similar, yet distinct molecules—for example, 
a partially randomized DNA strand. Indeed, Poissonian partitioning ensures that all 
members of the library will be processed in an identical way and independently of 
all others. 

Building on this idea, one of us reported a Positive Feedback Loop (PFL) network 
that tests the presence of a specific enzymatic activity and accordingly enables (or not) 
genetic amplification. Distributed in microdroplets together with a genetic library, 
this system implements an in vitro evolution algorithm, poised to select for a pre- 
defined catalytic activity. Over rounds of selection and diversification, this molec- 
ular algorithm will spontaneously ascend the local fitness landscape [58]. Although 
Directed Evolution experiments are not usually described in terms of an algorithmic 
search process, it is likely that the ability to perform precise computations on molec- 
ular signals, at a molecular level, and with a molecular outcome—i.e., molecular 
programming—can bring strong benefits to this field. 

This molecular evolution approach could be extended to optimize DNA-based 
molecular programs themselves, since their behavior is also genetically encoded. An 
array of compartmentalized molecular programs could clone and differentiate, each 
new compartment computing a function in a slightly different manner. By applying 
a selection pressure, one could then optimize a given molecular program or discover 
new functional architectures. This idea has been applied in silico [59], but presents 
intimidating technical difficulties in vitro, since it requires one to engineer a feedback 
loop that affects the architecture of the molecular program itself. 

In principle, different inputs can also be deterministically injected in the various 
sub-compartments. When working with very small volumes, it can be experimen- 
tally tricky to address each compartment individually, although various microfluidic 
or robotic platforms exist [60—62]. Approaches to create bead-displayed diversity, 
for example, using Split and Pool [63], could be explored in this respect. Given 
that the dominant application of molecular programming is to process input signals 
that are molecular—in an attempt to maintain the full sensing—processing—actuation 
chain at the molecular level—it becomes important to develop interfacing strategies 
for various types of molecules, including for example proteins [64, 65] and small 
molecules [66]. 


3.2 Independent Compartments Containing Different 
Circuits, All Working on the Same Inputs 


The second option to harness the parallelism of molecular circuits is to create many 
compartmentalized reactors containing different circuits. This basically allows one to 
apply a family of functions to a common input condition—an approach that resembles 
the paradigm of Single Instruction Multiple Data in parallel computing. 
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Genot et al. demonstrated this idea by introducing a microfluidic droplet chip that 
scans up to three component concentrations simultaneously [21]. With this device, 
millions of droplets, each containing a different combination of the three compounds 
of interest, and collectively mapping any desired parameter subspace, were generated 
in less than one hour. Fluorescent inert dropcodes encoded the concentration of the 
parametric compounds and allowed one to recover the experimental condition for 
each droplet, even after they have been mixed and manipulated at will. 

In the process of building a DNA-based reaction network, one must decide the 
concentration of each functional DNA strand, enzymes, additives, etc. Since all 
these parameters are described by continuous variables, this complexity leads to 
painstaking rounds of trial and error, before a functional region can be found in 
parameter space. The microfluidic platform was then used to map the experimental 
bifurcation diagram of a predator-prey oscillator, scanning simultaneously its prin- 
cipal control parameters: the prey reproduction rate, the global growth rate and the 
species lifetime (through the concentration of an exonuclease). The finesse of scan- 
ning exposed a tiny region of parameters with an exotic behavior: there, droplets 
would oscillate for some time, then stay completely flat for a number of periods, 
then resume strong regular oscillations. The observation of this mode, known as 
“hard excitable”, remains to date the only example in DNA-based circuits. 

More generally, this droplet-scanning platform addresses a common issue of 
multicomponent systems which require the titration of many concentrations. In prin- 
ciple, parallelization using microcompartments could be used to expedite the opti- 
mization process, by testing different conditions in each microreactor. Note that this 
assumes that the compound of interest does not cross the compartment boundaries 
[25, 26]. 


3.3 Cross-Talking Compartments Collaborating to Compute 
a Global Response 


Notwithstanding the examples presented above, it is still difficult to distribute 
a molecular program in many sub-compartments and simultaneously control the 
communication channels between these units. Pioneering works based on top-down 
microfluidics and microfabrication [23] involve a dozen of compartments at most, 
but it is likely that thousands—or even millions—of compartments would be needed 
for a powerful parallel computation to emerge. 

On the bottom-up (self-assembly) side, the trafficking of molecules across vesi- 
cles or droplet interfaces is still challenging. Molecular pores, which spontaneously 
insert in lipid bilayers, provide an interesting option, but DNA strands are large 
macromolecules, and biological channel proteins do not naturally allow DNA traffic. 
For instance, the pores used in nanopore sequencing evolved to traffic small ions and 
organic molecules. It took a lot of protein engineering to thread DNA through them, 
and the process is not spontaneous: it requires a voltage difference as driving force 
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and a polymerase for regulation [67]. The engineering of DNA-based membranes 
pores with larger apertures could provide some interesting options. 

Here again, Nature could provide some inspiration, as cells do not use protein 
pores to traffic DNA, but instead the fusion and splitting of small vesicles—of 
which exosomes provide a prominent example. Exosomes are nanometer-sized (~40— 
200 nm) extracellular compartments that are secreted by some cells, enclosing 
biomolecules like RNA, DNA or proteins, and which are captured by other cells 
at remote locations. Although the exact mechanism regulating the production and 
absorption (endocytosis) is still being elucidated, exosomes seem to work as a “postal 
service” by which cells can send molecular messages to each other [68]. 

Finally, although we have focused here on the spatial parallelization of mass- 
action CRNs—that is, systems governed by concentrations and Ordinary Differen- 
tial Equations (ODEs)—we stress that other approaches exploit the parallelism of 
computational chemical operations to manipulate complex libraries [69, 70]. One 
possibility is to perform the computation directly within a single (supra-) molecule 
in solution, such as a single DNA nanostructure [71-73]. Another approach would be 
to encode the computation in successive transformation of a single DNA sequence. 
For this purpose, the developing toolkit of solid-phase DNA manipulations [57] could 
enable sequential molecular transformations at tremendous throughputs. 

What could be the future applications of parallel molecular computations? There 
are many tantalizing areas where it could solve open problems. In directed evolu- 
tion, it could steer the evolution of DNA, RNA or proteins toward highly functional 
forms. Rather than screening for the absence or presence of a function in a molecule 
(single objective evolution), a molecular circuit could weigh multiple objectives (e.g., 
for an enzyme, it could assess both its thermodynamic constants, measured by its 
Michaelis—Menten constant Km, and its kinetic constant, measured by its turnover 
rate V) and assign an integrated fitness score to each compartment. After each round, 
a compartment would get a chance to be replicated according to its fitness, thus 
steering the population toward a region of desirable characteristics. Parallel molec- 
ular computations could also empower the nascent field of DNA data storage, for 
instance, to find the number of occurrences of a pattern in a database. A DNA 
database could be split and compartmentalized, and the same computation could be 
applied simultaneously to each compartment by a molecular program. For very large 
database (say in the Petabytes or Exabytes range or over), molecular computations 
may beat electronic computations in terms of energy-cost or speed, simply because 
their scaling is much better. 
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Abstract We describe social DNA nanorobots, which are autonomous mobile DNA 
devices that execute a series of pair-wise interactions between simple individual 
DNA nanorobots, causing a desired overall outcome behavior for the group of 
nanorobots which can be relatively complex. We present various designs for social 
DNA nanorobots that walk over a 2D nanotrack and collectively exhibit various 
programmed behaviors. These employ only hybridization and strand-displacement 
reactions, without use of enzymes. The novel behaviors of social DNA nanorobots 
designed here include: (i) Self-avoiding random walking, where a group of DNA 
nanorobots randomly walk on a 2D nanotrack and avoid the locations visited by 
themselves or any other DNA nanorobots. (ii) Flocking, where a group of DNA 
nanorobots follow the movements of a designated leader DNA nanorobot, and (iii) 
Voting by assassination, a process where there are originally two unequal size groups 
of DNA nanorobots; when pairs of DNA nanorobots from distinct groups collide, 
one or the other will be assassinated (by getting detached from the 2D nanotrack 
and diffusing into the solution away from the 2D nanotrack); eventually all members 
of the smaller groups of DNA nanorobots are assassinated with high likelihood. To 
simulate our social DNA nanorobots, we used a surface-based CRN simulator. 


1 Introduction 


1.1 Motivation 


Due to its simple base-pairing and predictable secondary structure, DNA is an ideal 
material to construct useful molecular-scale objects and devices. DNA nanorobots 
are molecular-scale synthetic devices composed primarily of DNA that can execute 
a variety of operations. Autonomous DNA nanorobots operate without outside medi- 
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ation. DNA walkers are a class of mobile DNA nanorobots which can move over a 
nanotrack composed of DNA stepping stones. The nanotrack may be 1D or 2D and 
may be either a self-assembled DNA nanostructure or a set of DNA strands affixed 
to a surface. Autonomous DNA nanorobots have been demonstrated to perform a 
small number of moderately complex tasks including walking over nanostructures, 
maze traversal, and cargo delivery activities. 

A major remaining challenge in the field of DNA nanoscience is to increase the 
complexity and diversity of the activities DNA nanorobots can perform in spite of 
practical limitations on the complexity of each individual DNA nanorobot. How can 
one design molecular-scale systems with multiple mobile autonomous nanorobots 
which exhibit a group behavior that is significantly more complex than the behavior 
of individual nanorobots? We take inspiration from the field of Sociobiology pio- 
neered by Wilson, which demonstrated how social insects such as ants and honeybees 
perform a wide variety of relatively complex organized group behaviors, even though 
the individual insects have quite limited brains. 


1.2 Summary of Our Results 


We describe social DNA nanorobots, which are autonomous mobile DNA devices that 
execute a series of pair-wise interactions between simple individual DNA nanorobots, 
causing a desired overall outcome behavior for the group of nanorobots which can 
be relatively complex. We present various designs for social DNA nanorobots that 
walk over a 2D nanotrack and collectively exhibit various programmed behaviors. 
In our designs, we employed only hybridization and strand-displacement reactions, 
without use of enzymes; strand-displacement reactions are used for communication 
between pairs of nearby DNA nanorobots (where a finite amount of information is 
transferred between a pair of nearby DNA nanorobots). The novel behaviors of social 
DNA nanorobots designed here include: 
(1) Self-avoiding random walking, where a group of DNA nanorobots randomly walk 
on a 2D nanotrack and avoid the locations visited by themselves or any other DNA 
nanorobots. 
(11) Flocking, where a group of DNA nanorobots follow the movements of a desig- 
nated leader DNA nanorobot, and 
(iii) Voting by assassination, a process where there are originally two unequal size 
groups of DNA nanorobots; when pairs of DNA nanorobots from distinct groups 
collide, one or the other will be assassinated (by getting detached from the 2D 
nanotrack and diffusing into the solution away from the 2D nanotrack); eventually 
all members of the smaller groups of DNA nanorobots are assassinated with high 
likelihood. 

We simulated our social DNA nanorobots using the Surface CRN Simulator of 
Clamons[1, 2]. 
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1.3 Organization 


e In Sect.2, we overview the field of Sociobiology and in particular the behavior of 
social insects. 

e In Sect.3, we review prior DNA nanorobots, including prior DNA walkers, and 
programmable DNA nanomachines. 

e In Sect.4, we present our designs and simulations of various social DNA 
nanorobots. 

e In Sect. 5, we conclude with a discussion of future work and challenges, including: 
further development of simulation software for social nanorobots, experimental 
demonstrations of social DNA nanorobots, design of further social DNA nanorobot 
behaviors, and possible use of the diffusion of pheromone-like DNA molecules 
for communication between social DNA nanorobots. 


2 Sociobiology 


The concept of Sociobiology was defined by Wilson [3] in 1975. The field of Sociobi- 
ology aims to investigate and explain the evolved social behaviors of social animals. 
Sociobiology studies have been made for example of mating patterns, aggression, 
nurturance, pack hunting, and the hive society of social insects. High-level social 
organizations can be found in social insects that have the following three charac- 
teristics: cooperative brood care, overlapping generations, and a division of labor 
into reproductive and non-reproductive groups[4—8]. Social insects include ants and 
termites, and some social bees and wasps. Social insects gain several advantages by 
living together. They work together to gain resources, share their findings with each 
other, defend their home when under attack, and attack other insects for territory and 
food. Social insect communities are divided into castes by their function and behav- 
ior; these include a reproductive caste (e.g., the queen) and the sterile caste (soldiers 
and various types of workers) [9]. The reproductive caste carries out the basic func- 
tion of reproduction, and the sterile castes do various types of tasks including taking 
care of the reproductive members. There are generally multiple subcastes of insects 
within the sterile castes, which do specialized tasks. For example, the soldiers defend 
the colony against predators and the workers are responsible for foraging, construc- 
tion and repair of the nest and feeding the larvae and brood care, all tasks that are 
typically done by specialized sterile subcastes (which may depend on the insects age 
or development). 

The communication signals between the social insects required for this control 
can be mechanical, optical, or chemical. Mechanical communication between social 
insects includes direct physical contact between members of the insect colony. Opti- 
cal communication between social insects includes visual displays and stylized move- 
ments, sometimes termed dances, that communicate discoveries of food and their 
locations. Olfactory communication between social insects includes chemical tracks 
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that specify a path to important locations, as well as diffused chemical factors known 
as pheromones. Pheromones can trigger a social response in members of the same 
species, also often playing an important role in the development and maintenance of 
insect society [10]. Also, ants and termites are specialized to perform specific func- 
tions (e.g., attacking & foraging) and can communicate and lay down chemical tracks 
to specify a path to important locations such as food sources. (See Sect.5 for a dis- 
cussion of possible use of diffusion of pheromone-like molecules for communication 
between nanorobots.) 

Quite surprisingly, the overall control of these insect communities is not done by 
the reproductive caste, but instead the control is done distributively within the sterile 
subcastes via group social interactions. Particularly, complex collective behavior 
is found in honeybee colonies, where individual honeybees can be specialized for 
foraging and harvesting. For example, after traveling outside the colony’s nest to 
find possible locations of flowers with pollen, successful honeybee foragers can 
share information about the direction and distance to a food source [11] by use of 
a flying dance known as a waggle dance (a particular figure-eight dance) [11-13]. 
These dances communicate (i) the existence of pollen, (ii) its quantity, and also 
(iii) its general direction. Also, the honeybees of the species Apis mellifera perform 
tremble dances, which recruit receiver bees to collect nectar from returning foragers 
[14]. Seeley [15] demonstrated that honeybees also use a form of democratic voting 
(executed without input from the queen bee) to make important decisions, such as 
the best source of flowering plants providing pollen or the best location for a new 
hive home for the honeybee colony. 

Our paper is not concerned with evolution per se, but we propose designs (rather 
than evolution) of nanorobots which are derived from prior Sociobiology studies of 
these behaviors of social insects. We are particularly inspired by the social behaviors 
of social insects such as ants and bees, which exhibit complex collective behavior, 
even though the individual insects have quite limited brains, a property shared also 
by DNA nanobots. Notable diverse activities of social insects that we aim to mimic 
using DNA nanobots include: 


e Random walking: where insects of the colony make random walks. 

e Flocking: a group of insects of the colony follow a selected leader individual insect. 

Guarding: a group of insects of the colony follow, and guard from attack by another 

group, a particular individual insect of the colony. 

Attacking: a group of insects of the colony attack another group. 

Communication: between insects of the colony. 

Democratic group decision making (voting): among groups of insects of the colony. 

Foraging: a select foraging group of insects of the colony leave the colony and 

attempt to discover new sources of food, and then report back to the colony their 

discoveries. 

e Harvesting: a harvesting group of insects of the colony travel (navigating by either 
(a) following successful foragers or (b) following their chemical trail, or (c) via 
instruction from successful foragers) to the new sources of food and harvest it. 
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3 Prior DNA Nanorobots 


We use the term nanorobot for a molecular-scale device that can execute a variety of 
operations. A DNA (or RNA) nanorobot is a nanorobot composed of nucleic acids. 
Our overall aim is to take inspiration from the field of Sociobiology to design novel 
nanorobotic systems, but our designs will employ in part various known nanorobotic 
design techniques, which we briefly overview in this section. 

In 2000, Yurke and Turberfield [16] demonstrated the first DNA device, a DNA 
tweezer, that used DNA hybridization to power its movements. Their DNA tweezer 
was nonautonomous; its movements were controlled by adding ssDNA strands. 
The field of DNA nanorobotics has rapidly evolved from nonautonomous molec- 
ular devices that each successive movement needed to be controlled externally, to 
subsequent autonomous molecular devices that operate without control from external 
environment. Autonomous DNA nanorobot devices that executed in-place motions 
were demonstrated by [17—20]. 


3.1 Prior DNA Walkers 


A DNA nanorobot is autonomous if it operates without outside intermediate control, 
and otherwise is non-autonomous. In the past decades, researchers designed and 
experimentally realized DNA devices that can autonomously conduct complex tasks 
such as autonomous walking, maze traversal, and cargo delivery activities. 

A mobile DNA nanorobot is a DNA nanorobot that locomotes in some way. A 
DNA walker (also termed a Mobile DNA nanorobot) is a mobile DNA nanorobot 
which moves over a nanotrack composed of ssDNA pads. The nanotrack may be 1D 
or 2D and may be either self-assembled DNA nanostructure or a set of DNA strands 
affixed to a surface. An autonomous DNA walker is a mobile DNA nanorobot that 
locomotes autonomously. 

The concept of the DNA walker was first defined and named by Reif [21] in 
2002, who gave two autonomous designs for bidirectional movement. Sherman and 
Seeman [22] and Shin and Pierce [23] then experimentally realized the first DNA 
walkers, which were bipedal walkers that moved along a linear track. But they were 
nonautonomous walkers that required external control for each step. Yin et al. [24] 
and Tian and Mao [19] also demonstrated biped non-autonomous walkers that walked 
foot-over-foot along a linear track. 

In 2004, the first autonomous DNA walker was experimentally demonstrated 
[25, 26] by Reif’s group (in collaboration with Turberfield). This autonomous DNA 
walker and many subsequent DNA walkers [20, 27—31] made use of a series of 
enzymic reactions to power the locomotion; for example, Sahu [29] demonstrated a 
DNA nanotransport device which is powered by strand-displacing polymerase $29. 
Other autonomous DNA walkers employed DNAzymes, for example, Pei et al. [32] 
demonstrated a multipedal DNA walker that moves on a 2D substrate in a biased 
random walk and Lund [33] demonstrated DNA walkers that traversed paths on a 
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2D nanostructure guided by landmark molecules affixed to the 2D nanostructure. 
There are also many known DNA walkers powered by hybridization reactions. Also, 
autonomous DNA walkers have been demonstrated that navigate networks: Wickham 
et al. demonstrated a DNA-based molecular motor that can be routed through a 
network of tracks [34]. Chao et al. [35] demonstrated a DNA walker that conducted 
single-molecule parallel depth-first search on a 2D DNA origami surface. Reviews 
of DNA-based walkers are given in [36-38]. 


3.2 Prior Programmable DNA Nanorobots 


Various schemes for programmable autonomous DNA nanorobots that do computa- 
tions as they walk have been described; those of Yin and Reif et al. [39] use enzymes, 
and those of Reif and Sahu [40] use DNAzymes. One application of the DNAzyme 
programmable autonomous DNA nanorobot of [40] was a DNAzyme router for pro- 
grammable routing of nanostructures on a 2D DNA addressable lattice, where the 
2D DNA addressable lattice is embedded with a network of DNAzymes and where 
the routed path for the input nanostructure could be programmed by modifying its 
sequence. The input was encoded as a set of hairpins on the walker. The transport 
of the walker across the surface simulated a finite state machine that switched states 
based on input, where the state of the automaton was indicated by the DNAzyme 
that currently binds the walker. The various DNAzymes embedded on the 2D surface 
consumed the input as the walker moved. 

The Seeman group [41] demonstrated a non-autonomous programmable DNA 
nanorobot that transported a series of molecules to form a molecular-scale assembly 
line. Their nanorobot picked up cargo in a programmable manner when it walked 
on a DNA track. This process was non-autonomous since it required addition of 
appropriate fuel strands at specific time instants. 


3.3 Prior Autonomous DNA Walkers that Do Molecular 
Cargo-Sorting on a 2D Nanostructure 


The prior work most similar to that reported here is the work of Thubagere [42], which 
demonstrated an ingenious molecular-scale system with a group of autonomous DNA 
nanorobots executing a molecular cargo-sorting task on a 2D nanostructure. The 
2D nanostructure initially had various kinds of molecular cargo that needed to be 
transported to different targeted locations on a DNA nanostructure (a DNA origami 
surface). Each DNA nanorobot traversed a random walk over the 2D nanostructures 
and when encountering a molecular cargo, they loaded and transported the cargo to 
the targeted location (the goal). They used a simple protocol to perform a complex 
cargo-sorting task. When the robot randomly walked on the surface, if it moved 
local to a cargo molecule, the nanorobot picked the cargo up and continued walking 
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randomly. If it moved local to a goal molecule at the targeted location of the cargo 
molecule, the robot dropped the cargo off. The nanorobot repeated the above process 
until all cargo molecules were sorted. The picking up and dropping off process used 
strand displacement reactions. 


4 Design and Simulation of Social DNA Nanorobots 


Our approach is to adapt collective behavior strategies of social insects into molecular- 
scale nanorobots, which will be specialized to perform specific functions. We describe 
social DNA nanorobots, which are autonomous mobile DNA devices that execute 
a series of pair-wise interactions (only between pairs of nearby nanorobots) on a 
2D DNA nanotrack that determine an overall desired outcome behavior as a group. 
We give detailed designs that program social behaviors via interactions between 
individual DNA molecules. 


4.1 Social DNA Nanorobot Behaviors Designed 
and Simulated 


A basic behavior of social DNA nanorobots is Random Walking, where a group of 
DNA nanorobots make random traversals of a 2D nanotrack. For random walking, 
we will make use of a known design of [42]. 

The novel behaviors of social DNA nanorobots described here include: 
e Self-avoiding random walking, where a group of DNA nanorobots walk on a 
2D nanotrack and avoid the locations visited by themselves or any other DNA 
nanorobots. 
e Flocking, where a group of DNA nanorobots follow the movements of a designated 
leader DNA nanorobot. 
e Voting by assassination, a process where there are originally two unequal size 
groups of DNA nanorobots; when pairs of DNA nanorobots from distinct groups 
collide, one or the other can be assassinated (by getting detached from the 2D nan- 
otrack and diffusing into the solution away from the 2D nanotrack); eventually all 
members of the smaller groups of DNA nanorobots are assassinated with high like- 
lihood. 


4.2 Software for Stochastic Simulations of the Social DNA 
Nanorobots Behaviors 


We made stochastic simulations of the social DNA nanorobots behaviors listed above. 
Our simulation model is adapted from the Surface CRN Simulator of Clamons [1, 
2]. A chemical reaction network (CRN) contains chemical reactions and their rates. 
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In a 2D surface CRN model, each individual nanorobot to be simulated is a molecule 
attached at a specific position on a 2D surface, so that the nanorobot can only inter- 
act with neighbors and their attachment strands. The behaviors of DNA nanorobots 
moving on a 2D nanotrack can be modeled in this 2D surface CRN model as a set 
of chemical reactions between DNA walkers and DNA strands affixed to a surface. 
State transitions modeled chemical reactions (e.g., toehold-mediated strand displace- 
ments and dehybridizations) between DNA walkers and DNA strands affixed to a 
surface. We applied the Surface CRN Simulator specifically for optimized perfor- 
mance assessment of our social DNA nanorobot designs. Assessment criteria are 
formulated to specify the performance of the designs of the social DNA nanorobots, 
and redesigns were made to improve performance. 

In the next subsections, we give detailed domain level designs of social DNA 
nanorobots that conduct random walking, self-avoiding random walking, flocking 
and voting by assassination. 


4.3 A Prior DNA Nanorobot that Autonomously Walks 


There are many prior known designs for DNA nanorobots that make random traver- 
sals on a 2D DNA nanotrack. Our design for a DNA nanorobot that autonomously 
walks uses the design for arandom DNA walker of Thubagere [42]. A nanotrack with 
an array of attached ssDNA pads that are self-assembled on a surface is illustrated 
in Figure | (for simplicity it is only illustrated in 1D). The 2D arrangement of pads 
on a 2D nanotrack is illustrated in Fig.2. There are two types of pads: (i) ssDNA 
Po = B* A* attached at its 3’ end and (ii) ssDNA pı = C* B* attached at its 5’ end. 
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Fig. 1 Known design of a DNA nanorobot that executes a random walk (for simplicity only 
illustrated in 1D) 
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Fig. 2 2D locations of the 
random walker’s pads on a A 
2D Nanotrack =. ty G 


(These ssDNA sequences, and all sequences given subsequently, are written in 5’ to 
3’ order.) There is a single type of ssDNA nanorobot, the DNA Walker W = ABC 
(see Fig. 1), which operates as follows: 


(a) A low concentration of W walker strands are added to the buffer solution con- 
taining the 2D nanotrack, and a few of these W strands hybridize to random pads 
of the nanotrack. 

The buffer solution is replaced, so as to remove the remaining non-hybridized 
W walker strands from the solution surrounding the nanotrack. 

As illustrated in Fig. 1, at first, the W strand hybridizes with a pọ = B* A* pad, 
which is the State 0. In State 1, the unpaired domain C of W hybridizes with 
domain C* of pı. Then domain B* of pad p; can displace the domain B* of pad 
Po, so domain B of W hybridizes with both po and pı, which is the intermediate 
State 1.5, and this process is reversible. If the strand displacement completes, it 
enters State 2, in which W detaches fully from the po pad and hybridizes with 
a pı = C*B* pad. From State 0 to State 2, W walks from pg pad to pı pad due 
to hybridization and strand displacement. Similarly, W walks from pı pad to po 
pad when it proceeds from State 2 to State 1.5 to State 1 to State 0. 

W walks successively from either (a) the po to the pı pad of the nanotrack, or 
(b) the p; to the po pad of the nanotrack, as described above. As result, W walks 
randomly over the nanotrack. 


(b 
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Figure 3 gives an example simulation (also see mp4 at [43]) run for the W 
nanorobot (in green) randomly walking on a 2D nanotrack. Where it randomly walks 
on the nanotrack, the grid turns to orange to show the trace of the W nanorobot. The 
W Walker traverses uniformly randomly over the nanotrack. 


4.4 Prior Demonstrated Technique for Hybridization 
Inhibition of Short Sequences Within the Hairpin Loops 


Our DNA nanorobots will make use of hybridization inhibition of short ssDNA 
sequences within the hairpin loops; this seems well established. Extensive prior pub- 
lished works (see survey [44]) have demonstrated (in simulation as well experimen- 
tally) localized reactions that make use of DNA hairpins for inhibition of hybridiza- 
tion on short ssDNA sequences within the hairpin loops. For example, the hybridiza- 
tion chain reaction (HCR) [45] makes use of a series of strand-displacements to open 
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Fig. 3 Simulation of a DNA nanorobot that executes a random walk (with orthogonal steps) on a 
2D nanotrack 


a series of DNA hairpins; while HCR was initially demonstrated [45] in well-mixed 
solutions, Reif’s group has shown that it also can be made to operate reliably in 
localized reactions. Reif’s group has made extensive probabilistic analysis and sim- 
ulations (in the studies [46-48]) and has also experimentally implemented (in the 
paper [49]) a localized version of HCR where the hairpins are localized on DNA 
Origami. 


4.5 A Novel DNA Nanorobot that Executes a Self-Avoiding 
Walk 


A self-avoiding walk (SAW) is a sequence of moves on a lattice (a lattice path) that 
does not visit the same point more than once. SAWs have a number of important 
applications, e.g., in the modeling of nucleic acids, peptides, and proteins. It is known 
[50] that a self-avoiding random walk on the 2D square lattice lasts an average of 
approximately 71 steps before the walker is trapped. (Note: While we could modify 
our design given the below to decrease the likelihood the walker gets trapped at a 
Po pad position, and so increase the average number of steps before the walker is 
trapped, but then the resulting system would not correspond to the classical self- 
avoiding random walk on the 2D square lattice.) 

Here, we described the design of a DNA nanorobot that makes a random self- 
avoiding traversal of a2D DNA nanotrack (which is a 2D square lattice). It is the first 
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to execute a self-avoiding walk without use of enzymes. The are two types of pads: 
(i) ssDNA po = B* A* attached at its 3’ end and (ii) DNA hairpin pı = C* B* E{ E% B 
attached at its 5’ end. Figure 4 illustrates (for simplicity on a 1D nanotrack) the self- 
assembly of the pads into DNA hairpins. Figure 5 gives the 2D arrangement of these 
pads on a 2D nanotrack. Note: To improve stability of the hairpin p4, its loop portion 
EŤ E> needs to be relatively short compared to its stem sequences B and B*. The 
initial buffer solution So is a conventional buffer solution with the component strands 
of a DNA nanotrack, and the buffer solution S; is Sọ plus a high concentration of 
another DNA hairpin type hı = E2 E1 BE}. 

The operation of a self-avoiding walker W = ABC is as follows: 


(a) A low concentration of W walker strands are added to the initial buffer solution 
So containing the nanotrack, and a few of these W strands hybridize to random 
pads of the nanotrack. 

(b) The buffer solution So is replaced with buffer solution S4, so as to remove the 

remaining non-hybridized W strands from the solution surrounding the nan- 

otrack, and to add the DNA hairpin A1. 

As in Fig. 4, the Walker W first hybridizes with a pp = B* A* pad, which is State 

0. In State 1, the unpaired domain C of W hybridizes with the domain C* of the 

pı pad. Then the domain B* of pı can displace the domain B* of po, the domain 

B of W hybridizes with both po and pı and also hairpin pı is opened, giving 

intermediate State 1.5. (Note this transition process is reversible.) If and when 

the strand displacement finishes, it enters State 2, in which W hybridizes with 

a pı pad. In summary, transitioning from State 0 to State 2, the W nanorobot 

walks from po pad to pı pad due to the hybridization and strand displacement. 

(d) (Note our design includes a way to keep W from stepping to the next pad, 

therefore letting the track hairpin reform. ) The ssDNA strand h; is in high 

concentration in buffer solution S4, with domain E> that can hybridize with the 
newly released £3 domain of p1, so hı is then opened and displaces the domain 

B of W. The unpaired domain AB of W can hybridize with the next po and 

moves from p; to po, entering State 1. After some time, W detaches from pı 

and hybridizes with a po pad, taking it back to State 0. Then the previously visited 
pı will hybridize with a hairpin A; from the buffer solution, which hinders the 
pis reformation of the hairpin, so W cannot move back to the visited pı. (Also, 
note that the small length of the loop of W deters i; from binding to the loop of 

W in State 0.) 

The W nanorobot walks from either (a) pp to a non-visited pı pad of the nan- 

otrack, or (b) pı to a po pad of the nanotrack, as described in part c, d. As a 

result, the W nanorobot avoids the locations visited by itself or any other DNA 

nanorobot when it moves on the nanotrack. (A similar design can also be used 
for foraging nanorobots.) 


(c 
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Figure 6 (also see mp4 at [43]) gives an example simulation run for a W nanorobot 
(in green) that executes a self-avoiding random walk on a 2D nanotrack; it randomly 
walks on the nanotrack, and the grid turns to orange to show the trace of the W 
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Fig.4 Design of a DNA nanorobot that executes a self-avoiding walk (for simplicity only illustrated 
in ID) 


Fig. 5 2D locations of the self-avoiding walker’s pads on a 2D nanotrack 
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Fig. 6 Simulation of a DNA nanorobot that executes a self-avoiding walk 
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nanorobot. The W walker randomly moves over the 2D nanotrack and then eventually 
stops when all the pads around it are visited. 


4.6 Flocking: Novel DNA Nanorobots that Follow a Leader 


This is a DNA nanorobot system with two types of DNA nanorobots (the designated 
leader and the followers): the leader makes a random traversal of a 2D nanotrack and 
the other DNA nanorobots follow the movements of the leader. Let By = Bog Bop. 
There are two types of pads: (i) hı = Bı B} A} B2, By Aj attached at its 3’ end and (ii) 
hy = CÏ BY Boa C3 Bz B; attached at its 5’ end. Figure 7 illustrates (for simplicity on a 
1D nanotrack) the self-assembly of the pads into DNA hairpins. The 2D arrangement 
of these pads on a 2D nanotrack is illustrated in Fig. 8 . 
Note: Observe that each of hı and h2 each self-assemble into a type of hairpin with 
two separate short loops: hı has two separate short loops, one with Až and another 
with Bž,, whereas hz has one short loop with B3,C3. This use of small hairpin loops 
is a deliberate design with the goal of inhibiting the hybridization follower W2 with 
A3, B3,,, C3, B3,. This will make it more difficult for a follower W, to avoid following 
a leader W;. Note that to ensure stability of each the hairpins, their loop portions 
need to be relatively short compared to the Bı and BY portions of their stems. 
There are two types of ssDNA nanorobots, the leader W; = A,B,C, and the 
follower Wz = Az B2C2, which operate as follows: 


(a) Wj (leader) and W2(follower) strands are added to buffer solution containing the 
nanotrack. 

(b) The leader W; hybridizes with h; on the domain A, and opens the hairpin of h, 
by strand displacement; then a follower W2 can hybridize with the newly opened 
hy. 

(c) The leader W; moves from pad h to pad h2 by hybridizing with CY of h2 and 
opening h, by strand displacement; then follower W, can hybridize with the 
newly opened hp. 

(d) After W; and W2 Walkers leave these pads, a limited number of further W2 
strands can walk nearby, and similarly follow leader W4. 

(e) As in Fig.7 (for simplicity it is only illustrated in 1D), the leader W, walks 
successively from either (a) a ho pad to a hı pad of the nanotrack, or (b) a hy 
pad to a ho pad of the nanotrack. Whenever a follower W stand hybridizes with 
a hı pad, the hairpin hı needs to be open (where the loop of the hairpin was 
opened by a leader W; and the loop closes after some time), so W2 is forced 
to follow the leader W;. Hence, leader W; walks randomly over the nanotrack 
and is followed by a group of W2 followers. (The maximum size of group of W2 
followers is limited by the duration that hı, h2 remain open, and this parameter 
can be set by appropriate DNA strand design.) 
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Fig. 7 Design of DNA nanorobots that follow a leader DNA nanorobot 
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Figure 9 (also see mp4 at [43]) gives an example simulation run for a group 
of DNA nanorobots that follow a leader DNA nanorobot randomly walk on a 2D 
nanotrack. The yellow and blue cells are the W; (leader) and W2 (follower) nanorobots, 
respectively. The W; walker can randomly walk on the nanotrack, while the W2 
walker can only follow the W; walker or stay in place. If the W2 walker follows the 
W; walker, the grid it visited turns to green to show the trace of the W2 Nanorobot; 
if the W2 walker does not follow the W, walker, it will stay in place and keep blue 
until the W, walker visited W2’s neighbor again. 


4.7 Novel DNA Nanorobots that Vote by Assassination 


Distributed voting is essential to many distributed computing and population proto- 
cols [51-53], where processors are restricted to pair-wise interactions. For example, 
distributed voting can be used for leader election which allows a process in a dis- 
tributed system some special powers in the distributed system, often allowing for 
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Fig. 9 Simulation of DNA nanorobots that follow a leader DNA nanorobot 


simplified protocols, reduced coordination, and improved efficiency. The task of 
determining approximate majority in distributed computing can be reduced to the 
case where each processor has a binary value in {0, 1}, assuming an initial margin of 
disparity ô > 1 between those processors with value 0 and value 1, then the problem 
is for the set of processors to settle on a majority value in {0, 1}. A fast randomized 
distributed protocol for approximate majority was given by Angluin et al. [54, 55]. 
(Interestingly, Cardelli [56] observed that this approximate majority protocol was 
used in certain cell cycle switches.) Let an event with size parameter n be high prob- 
ability if it has likelihood > 1 — = for some constant a > 1. Angluin et al. [54, 55] 
proved that with high probability, their randomized distributed protocol n proces- 
sors reached consensus on a majority value after O(n log n) pair-wise interactions, 
assuming that the initial margin of disparity is > 5 = c,/n logn for a constant c > 1. 
A slightly modified version of their protocol proceeds in O(n log n) stages, where in 
each stage a random pair of processors compare their values; if their values are the 
same, they do nothing, and otherwise, a random processor of the pair drops out from 
subsequent stages of the protocol. Afterward, with high probability, only proces- 
sors with the same majority value remain. Then the other processors that previously 
dropped out are informed of that majority value. 

Condon et al. [57, 58] have given chemical reaction systems (CRN) for approxi- 
mate majority, which can in principle be implemented by strand-displacement reac- 
tions operating in solution. There has been no previous localized reaction protocol 
for approximate majority; a localized reaction protocol for approximate majority 
would likely operate far faster than a solution-based protocol which is delayed by 
diffusion. 
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Fig. 10 Separate walks of assassinator nanorobots W; and W2 


Fig. 11 2D locations of the 
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We expect distributed voting to be also of central importance to programming 
complex behavior in social DNA nanorobots. The fast randomized distributed pro- 
tocol for approximate majority given by Angluin et al. [54, 55] inspired our DNA 
nanorobots assassination protocol design described here. Our design for distributed 
voting of DNA nanorobots has the nanorobots exhibit an anti-social behavior to 
achieve group decision making. The idea is the DNA nanorobots vote by assassina- 
tion. There are originally two unequal size (with sufficiently large size difference) 
groups of DNA nanorobots; when pairs of DNA nanorobots from distinct groups 
collide, one or the other is assassinated (it is detached from the 2D nanotrack and 
then diffused into the solution away from the 2D nanotrack). Eventually, all mem- 
bers of the smaller of the two groups of DNA nanorobots are assassinated with high 
likelihood. 

Let C = CaCpCc where |C,| = |C.| are between 3 to 5 bases pairs (sufficient to 
act as toeholds), and |C;,| > 10. There are three types of pads: (i) pp = (B2)*C*(B,)* 
attached at its 5’ end and with the ssDNA sequence C, hybridized to the comple- 
mentary subsequence C% of po, (ii) pi = (B1)*(A1)* attached at its 3’ end, and 
(iii) p2 = (A2)*(B2)* attached at its 3’ end. The 2D arrangement of pads on a 
2D nanotrack is illustrated in Fig. 11. There are two types of ssDNA nanorobots: 
W, = A; B\C and W2 = C B242. Figure 10 illustrates the separate walks of assassi- 
nator nanorobots W, and W, (for simplicity it is only illustrated in 1D). 
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Fig. 12 Design of DNA nanorobots that vote by assassination where nı = 10 and n2 = 6 


There is also a short ssDNA sequence C; initially hybridized to the complementary 
subsequence C% of po, which has purpose of substantially increasing the likelihood 
that W; and W, will simultaneously bind to a pad pp. Once W; or W, is partly 
bound to po, it still has to engage in a relatively slow strand-displacement reaction to 
dislodge the C,, which was already hybridized to the complementary subsequence 
C% of po, increasing the likelihood that a second nanorobot also attaches to po. (Cp 
needs to be in sufficient concentration in the solution, allow it to re-hybridize to C% 
of po if strand-displaced.) 

Initially, a combination of an unequal concentration of W; and W strands is added 
to the buffer solution containing the nanotrack; some W; and W3 strands hybridize 
to random pads of nanotrack. The buffer solution is replaced, so as to remove the 
remaining non-hybridized W and W3 strands from the solution surrounding the 
nanotrack. Let nı and nz be the (unknown) numbers of W; and W3 strands initially 
attached to the nanotrack and let n = nı + n2. We assume n > 0 and the initial 
margin of disparity |n; — n2| > 5 with 6 = cy/n logn and constant c > 1. Our goal 
is: to test if nı > n2 orn, < n2. 

Randomized Assassination Protocol: The nanorobots W, and W3 operate as follows: 


(a) As in Fig. 10, W; and W2, when separate, walk randomly over the nanotrack, 
The W; nanorobot walks only on the po and pı pads of the nanotrack, whereas 
the W, Nanorobot walks only on the pp and p pads of the nanotrack. 

(b) As in Fig. 12, whenever both a W; and a W; nanorobot collide at a common po 
pad of the nanotrack, a random one of either W; or W2 nanorobot is partially 
detached via strand-displacement at domain C, and then is melted off to enter 
the buffer solution. By this process, pairs of W;, W2 duel and randomly one of 
the nanorobots assassinate the other (which disassociates from the nanotrack). 


Note that often only one of W; or Wz will arrive at the po pad of the nanotrack, 
in which case there will be no competition for hybridization at domain C, and no 
assassination. However, for the protocol to operate correctly, there just needs to be 
a constant finite probability that both a W, and a Wz nanorobot collide at a common 
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Po pad of the nanotrack. It is shown below that there is only constant probability of 
the detached nanorobot W; reattaching to the 2D nanotrack. 

Probabilistic Analysis of Re-attachment Likelihood of a Detached (assassinated) 
Nanorobot: We assume: 


e The protocol is executed over some finite time duration. 

e The size of each 2D nanotrack is very small compared to the width of the test tube 
containing the buffer solution. 

e The detached (assassinated) nanorobot W; takes a random 3D walk within the 
volume of the test tube after detaching from its 2D nanotrack. 

e There is a very small concentration of 2D nanotracks in the buffer solution, so it is 
very low likelihood pp > 0 that a detached nanorobot W; later attaches to another 
2D nanotrack also in the buffer solution of the test tube during the duration of the 
protocol. 


Consider a sphere S fully containing the 2D nanotrack (and 3 times its diameter) 
from which a nanorobot W; is detached from during the assassination protocol. It is 
easy to see that there is a constant probability o} > 0 that a random 3D walk of W; 
takes it outside S. The further random movement of nanorobot W; can be modeled 
by a random walk on a 3D grid whose nodes correspond to spheres of the same 
diameter as S. By Pélya’s recurrence theorem [59, 60], in a random walk on a 3D 
grid starting at a given start node, the likely of never re-visiting that start node is a 
constant p2 > 0. Hence during the duration of the protocol: 


e There is at most a constant total likelihood < (1 — 9) 1 p2 that ananorobot W; that 
is detached from a nanotrack during the assassination protocol then is reattached 
to the same nanotrack or any other nanotrack in the test tube, and so 

e There is at least a constant likelihood > 1 — (1 — poọ)pı p2 that the detached 
nanorobot W; never re-attaches to any nanotrack. 


Probabilistic Analysis of Outcome of Assassinating Nanorobot Protocol: We further 
assume: whenever both a W; and a W) nanorobot collide at a common pg pad of 
the nanotrack, it is equally likely that the W; or Wz nanorobot is detached from the 
nanotrack. From the above, there is at least a constant likelihood that the detached 
nanorobot never re-attaches to any nanotrack during the duration of the protocol. 
Observe that as a consequence, ultimately either: 


e one or more W; remains attached to the nanotrack and all the Wz are detached, or 
e one or more W, remains attached to the nanotrack and all the W; are detached. 


Angluin et al. [54, 55] gave a similar randomized protocol for approximate majority 
(for distributed computing applications), and their probabilistic analysis implies that 
if the initial margin of disparity is |n; — n2| > c./n, + nz (for some constant c > 0), 
then ultimately, with high probability: 


e if at least one W; remains attached to the nanotrack, then nı > n2, and 
e if at least one Wz remains attached to the nanotrack, then n > nı. 
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Fig. 13 Simulation of DNA nanorobots that vote by assassination where nj = 
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Fig. 14 Simulation of DNA nanorobots that vote by assassination where nı = 8 and n? = 8 


Simulation of the Assassination Protocol: 


e Figure 13 gives an example simulation run for two groups of DNA nanorobots with 
unequal sizes that vote by assassination on a 2D nanotrack; the number of W; is 
larger than the number of W initially attached to the nanotrack (nı = 10, n2 = 6), 
and eventually all members of W) DNA nanorobots are assassinated. 

e Figure 14 gives an example simulation run for two groups of DNA nanorobots 
with equal sizes that vote by assassination on a 2D nanotrack; the numbers of 
W, and W, strands initially attached to the nanotrack are equal (n; = n = 8), so 
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Fig. 15 Results of a collection of simulation runs for DNA Nanorobots voting by assassination on 
a 2D nanotrack with different initial n, where nı > n2 


they have the same chance to win the game, and in this simulation, eventually all 
members of Wa DNA nanorobots are assassinated. 

e Figure 15 (also see mp4 at [43]) gives the results of a collection of simulation runs 
for DNA nanorobots voting by assassination on a 2D nanotrack with different 
initial n, where the number of W; is larger or equal to the number of Wz (n; > n2) 
that are initially attached to the nanotrack. The X-axis and Y-axis represent the 
initial n2/n; and final n2 /n; respectively. Due to nı > nz initially, itis more likely 
that eventually W, remains attached to the nanotrack and all the W2 are detached, 
and the final n2/n, will convert to 0. With same n, when initial n2/n, goes smaller, 
the final n2/n, will approach O with higher likelihood. When n goes larger, the 
final n2/n, will approach O with higher likelihood. 


5 Discussion 


We described social DNA nanorobots: these are autonomous mobile DNA nanorobots 
that execute a series of pair-wise interactions that determine an overall desired out- 
come behavior for the group of nanorobots. Our goal was to increase the complexity 
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of the various tasks the nanorobots can execute and at the same time preserve a 
low design complexity for individual nanorobots. We presented detailed designs for 
social DNA nanorobots that perform novel behaviors of self-avoiding walking, flock- 
ing, and voting by assassination, and their behaviors were simulated in the 2D surface 
CRN model. 


5.1 Further Development of Simulation Software for Social 
Nanorobots 


We are developing software extending the Surface CRN Simulator of Clamons [1, 2] 
specifically for use with social DNA nanorobots. The software will allow high-level 
specification and visualizations of state transitions (modeled by chemical reactions 
such as toehold-mediated strand displacements and dehybridizations) between DNA 
walkers and DNA strands affixed to a 2D surface. The software could provide an 
editable catalog of DNA nanorobot devices, improved visualization, and allow auto- 
matic incrementally optimized performance assessment. This software should signif- 
icantly improve performance assessments & design optimizations. We are also using 
the Visual DSD [61] to simulate the DNA hybridization and strand-displacement 
reactions of the individual DNA nanorobots and between pairs of DNA nanorobots. 


5.2 Experimental Demonstrations of Social DNA Nanorobots 


We are planning to experimentally demonstrate the social DNA nanorobots behaviors 
on 1|-dimensional DNA nanotracks and 2-dimensional DNA origami. After exper- 
imental demonstrations are made of each design and assessed, they may also be 
redesigned for further optimization. 


5.3 Further Social DNA Nanorobot Behaviors 


Our novel designs presented here for DNA nanorobots (self-avoiding walking, flock- 
ing, and voting by assassinations) can be employed in designs for even more complex 
behavior. For example, other behaviors of interest for DNA nanorobots include: 

e Guarding, where a group of DNA nanorobots follow and guard a particular DNA 
nanorobot from attack by another group of DNA nanorobots. Here we expect we can 
employ parts of our flocking design. 

e Attacking, where a group of DNA nanorobots attack another group of DNA 
nanorobots. Here we expect we can employ a simplification of the assassination 
design. 
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e Foraging and Harvesting. In foraging, a group of designated foraging DNA 
nanorobots randomly walk on the 2D nanotrack, and can transform to a “discov- 
ery state” when they discover a target molecule (e.g., a group of gold nanoparticles 
attached to 2D surface) (this makes use of our designs for self-avoiding random 
walking). In harvesting, a group of harvesting DNA nanorobots which follow the 
trail of foraging DNA nanorobots in discovery state and pick up the detected target 
molecules and deliver the target molecules to a designated region of the 2D nan- 
otrack. Designs for foraging nanorobots may employ our designs for self-avoiding 
random walking, and designs for harvesting nanorobots may employ our design for 
flocking. 


5.4 Communication Between Distant Social Nanorobots 


Prior Use of Potential Fields for Generating Autonomous Group Social Activ- 
ities: Another source of inspiration for collective behavior strategies by groups can 
be found in biology: for example, flocking of animals such as birds and schooling of 
amphibious animals. The behavior of these animals has been modeled by mathemat- 
ical models and computer programs. In 1989, Beni [62, 63] developed one of the first 
such flocking model, which he called swarm intelligence and made applications to 
multi-robot motion planning systems. Subsequently the field of swarm intelligence 
[64, 65] and artificial flocking grew rapidly and found applications to many applied 
areas in addition to robotics, such as for computer graphics. In 1994, Reif [66, 67] 
developed a general programmable scheme for multi-robot motion planning, termed 
Social Potential Fields, which made use of artificially defined potential fields that 
controlled the individual robots by weighted sum of decreasing functions of the dis- 
tance and direction of other local robots; he demonstrated various autonomous group 
social activities, including flocking, attacking, and guarding, using the Social Poten- 
tial Fields technique. Unfortunately, the potential field models assume far-distance 
field effects that are not easy to implement using local interactions between co-located 
DNA nanorobots. 


Using Instead Diffusion of Pheromone-like DNA Molecules for Communica- 
tion Between Social Nanorobots: Recall that the communication signals between 
social insects include pheromones; these are chemical factors that can trigger a social 
response in members of the same species [10]. We are exploring the use of diffu- 
sion of DNA molecules for communication between social nanorobots in a manner 
similar to the use of pheromones in social insects. For example, this technique may 
be employed by foraging and harvesting nanorobots. Suppose the goal is to detect 
and harvest a particular target molecule on the 2D surface on which the foraging and 
harvesting nanorobots walk on. The 2D surface would be decorated with additional 
hairpins that when opened by a foraging nanorobot (to announce the discovery of a 
nearby target molecule), would act as pheromones for the foraging nanorobots (e.g., 
the speed of the motion of nearby harvesting nanorobots could temporarily increased 
when encountering such an opened hairpin). 
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Abstract We summarize our work on gellular automata, which are cellular automata 
we intend to implement with gel materials. If cellular automata are implemented as 
materials, it will become possible to realize smart materials with abilities such as 
self-organization, pattern formation, and self-repair. Furthermore, it may be possible 
to make a material that can detect the environment and adapt to it. In this article, 
we present three models of gellular automata, among which the first two have been 
proposed previously and the third one is proposed here for the first time. Before 
presenting the models, we briefly discuss why cellular automata are a research target 
in DNA computing, a field which aims to extract computational power from DNA 
molecules. Then, we briefly describe the first model. It is based on gel walls with holes 
that can open and exchange the solutions that surround them. The second model is 
also based on gel walls but differs in that the walls allow small molecules to diffuse. 
In presenting the second model, we focus on self-stability, which is an important 
property of distributed systems, related to the ability to self-repair. Finally, we report 
our recent attempt, in the third model, to design gellular automata that learn Boolean 
circuits from input-output sets, i.e., examples of input signals and their expected 
output signals. 


1 Introduction: Why Cellular Automata? 


DNA computing is a research field that focuses on extracting computational power 
from DNA molecules and their reactions. Implementing a Turing machine using DNA 
is a typical example of research in this field. Research to realize cellular automata 
using molecules such as DNA has also been actively conducted. Why are cellular 
automata a target of research in DNA computing? 
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1.1 Computation by Molecules 


The first attempt to perform computations using molecules was a thought experi- 
ment in which molecules were used to make Turing machines. Research on molec- 
ular Turing machines began with Bennett’s concept of a biological computer [1]. 
Since then, proposals to realize Turing machines and various kinds of automata by 
molecules have been repeated, including the recent work on Turing machines by 
King et al. [2]. The work by Benenson et al. is a typical example of implementing 
finite automata [3]. 

However, the computational power of a single molecule is limited because it is not 
easy for different parts of a molecule to interact and exchange information. Therefore, 
the possibility of computing with a chemical solution containing a huge number of 
molecules has been explored. Adleman’s research on data parallel computation by 
molecules sits in this context [4]. 

Since then, in DNA computing, research on computation by a chemical solution 
has progressed from logic circuits to neural networks. For example, some studies 
implemented logic circuits and neural networks using seesaw gates [5, 6]. 

In parallel with such efforts, the concept of a chemical reaction network (CRN) has 
been widely adopted as a unified standard model for computation using a chemical 
solution, and theoretical research on computation by CRN is being conducted [7]. 

However, the computational power of a chemical solution that is uniform in space 
is also limited. Information should be encoded as concentrations of molecular species. 
Therefore, many researchers in the field became interested in cellular automata, a 
computational model that divides space into cells. Researchers began implementing 
cellular automata using molecules. Historically, the self-assembly of DNA tiles, 
pioneered by Seeman and Winfree, was modeled using cellular automata [8]. In their 
model, the position where each tile is placed is considered a cell. In the process of 
self-assembly, each cell transitions from an empty state to a state in which it is filled 
with a tile. 


1.2 Smart Materials 


Researchers in DNA computing have continually asked this question: What are the 
possible applications of the computational power extracted from molecules? Among 
the various proposals, smart materials, i.e., those that exhibit intelligent behaviors, 
are considered promising [9]. 

If cellular automata are implemented as materials, it will become possible to 
realize materials with abilities such as self-organization, pattern formation, and self- 
repair. For example, suppose such a smart material is used to implement an artificial 
blood vessel. The vessel would form autonomously at an appropriate place in the 
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living body, and it would recognize any deformation or occlusion of itself and self- 
repair. Note that cellular automata capable of simple self-assembly, such as those 
consisting of DNA tiles, are not sufficiently complex to make such smart materials. 

With more sophisticated cellular automata, it is possible to envision smart mate- 
rials that learn appropriate functions from external stimuli. For example, as we report 
later in this article, we have designed cellular automata that form a logic circuit from 
input-output sets in a three-dimensional (3D) cellular space. 


1.3 Why Discrete? 


When envisioning smart materials, such as the ones above, one may wonder why 
these should be discrete. It might be possible to realize the same functions in a contin- 
uous medium, for example, by taking advantage of Turing patterns and developing 
design principles for continuous media [10-12]. However, using current methods, 
the rational design of continuous media is challenging. Tuning a huge number of 
continuous parameters to produce target shapes or behaviors is a nontrivial task in 
general because these parameters are intricately intertwined. 

By contrast, we can take advantage of a sizeable accumulation of research on 
cellular automata. Therefore, we consider it an appropriate approach to design a 
discrete model while implementing the model in a continuous medium, such as a 
reaction—diffusion system. In addition, as we shall see in the next section, imple- 
mentation at the molecular level is naturally discrete, so this has a high affinity with 
cellular automata. 


2 Implementation of Cellular Automata 


Methods to implement cellular automata have been actively investigated in DNA 
computing, as we shall see below, but no optimal method has been established. 
Proposed methods can be classified into two groups: those at the molecular level and 
those using reaction—diffusion systems. 


2.1 Molecular Level 


Yin, Sahu, Turberfield, and Reif proposed a molecular-level implementation [13]. 
Each cell is represented by a strand of DNA, doubled in part, which is attached to a 
1D track. Each partially double strand has a single-stranded portion that represents its 
state and is changed by a restriction enzyme. The entire reaction system is complex, 
and therefore, its implementation is very challenging. 
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Qian and Winfree proposed CRN on a surface [14]. Again, each cell is represented 
by a partially double strand of DNA with a single-stranded portion. It makes a state 
transition by replacing its single strand with a new one. Two neighboring cells make 
state transitions at the same time. That is, by the reaction A+ B & > C+D, a cell 
in state A transitions to state C, and a neighboring cell in state B transitions to state 
D. The reaction A + B & —> C + D is a basic form of CRN called a population 
protocol. In other words, Qian and Winfree proposed realizing population protocols 
in a 2D cellular space, i.e., on a surface. Note that population protocols have been 
investigated as a standard model of distributed computing, and various distributed 
algorithms can be represented in this form. 

For implementing population protocols on a surface, Qian and Winfree proposed a 
very sophisticated system consisting of many reactions, called a strand displacement 
system, but it is not known whether its implementation by DNA is feasible. Despite 
the apparent difficulty of its implementation, theoretical research on CRNs on a 
surface is progressing [15]. 

A new design for molecular-level implementation by data parallel computation 
based on strand displacement was recently proposed with a simulation of Rule 110 
elementary cellular automata [16]. 

The self-assembly of DNA tiles, mentioned above, realizes a restricted class of 
cellular automata in which each cell can make a transition only once. This restriction 
is not appropriate for smart materials that are expected to repair themselves after 
external disturbances, although 2D self-assembly can simulate the spatiotemporal 
evolution of 1D cellular automata [17-19]. 

Compared to reaction—diffusion systems, molecular-level proposals of cellular 
automata have advantages including their size and the speed of state transitions. 
However, all proposals other than those for self-assembly are based on complex 
reactions of DNA and are inherently difficult to implement. In addition, the 
size of molecular-level implementations can be a disadvantage for expressing the 
macroscopic behaviors needed in smart materials. 


2.2 Reaction—Diffusion Systems 


Proposals have also been made to implement cellular automata using reaction— 
diffusion systems, as discussed above. For example, in the proposal by Scalise and 
Schulman, a cell in cellular automata is implemented by a region with a high concen- 
tration of a molecule that is attached to the region and cannot diffuse [20]. Each cell 
produces a molecule corresponding to its current state, and this molecule diffuses to 
the neighboring cells; in this paper, we call this a signal molecule. 

In the molecular robotics project led by Hagiya, Murata et al. actively attempted 
to implement cellular automata by a reaction—diffusion system in a gel [21, 22]. 
Cells were implemented with solutions separated by gel walls. Signal molecules can 
diffuse through gel walls while other molecules stay within each cell’s solution [23]. 
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In this project, Hagiya started to use the phrase “gellular automata” for cellular 
automata that are suitable for implementation with gels. Based on the implementation 
mentioned above, the second model of gellular automata was proposed, as explained 
in the next section. 

However, it is very challenging to implement reaction—diffusion systems in gel 
materials so that cellular automata work as expected. One serious problem is that 
the diffusion of signal molecules is not limited to the immediate neighbors of the 
cell that generated the signal but instead may diffuse to a neighbor of a neighbor. To 
overcome this, we should decompose signals at an appropriate rate to prevent them 
from diffusing beyond the immediate neighbors. Consequently, we should constantly 
supply energy and ingredients from an external environment. 

Further, if we allow an external environment to actively interact with a reaction— 
diffusion system, we can control diffusion via a global effect such as an electric 
or magnetic field or chemical gradient. Then, we can enhance models of cellular 
automata using such effects, as in our third model of gellular automata, proposed 
below, where neighboring cells can be distinguished by their directions. 


3 Gellular Automata 


In this article, the phrase “gellular automata” describes several models of cellular 
automata that we have proposed and investigated and which we intend to implement 
with gel materials. In this section, we briefly introduce three such models. The first 
is based on gel walls with holes that can open and exchange neighboring solutions. 
It is somewhat rudimentary in that precise control of the interactions between cells 
is difficult. The second model is also based on gel walls but differs in that the walls 
allow small molecules to diffuse. In describing the second model, we focus on the 
self-stability of the model. Note that this is an important property of distributed 
systems that is crucial for realizing smart materials. In the subsequent section, we 
report our ongoing work on the third model, which learns Boolean circuits from 
input—output sets, i.e., examples of input signals and their expected output signals. 


3.1 Gellular Automata with Holes 


The first model we investigated is based on gel walls separating cells of solutions. 
The walls are assumed to have holes that are opened and closed by the solutions that 
surround them [24-27]. 

This model is both continuous and discrete in the sense that, while concentrations 
of molecules are continuous with respect to time, a hole is either open or closed 
depending on a continuous parameter of the hole. The parameter is controlled by 
molecules called composers and decomposers (Fig. 1). 
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Fig. 1 Gel walls separate solutions. One wall has a hole. A molecule called a decomposer changes 
a parameter of the hole, and the hole eventually opens (top left to top right). When a hole is open, 
adjacent cells are connected, and their solutions mix. A molecule called a composer then changes 
the parameter in the opposite direction and the hole eventually closes (bottom right to bottom left). 
It is assumed that the composer is generated in the mixed solution (top right) 


We demonstrated the Turing computability of the model by encoding rotary 
elements in it [24]. Note that itis possible to implement any reversible Turing machine 
using only rotary elements [28]. 

We conducted some preliminary experiments using polyacrylamide gels with 
DNA bridges. We implemented holes that could open once or close once, but we 
could not implement a hole that could open and close repeatedly. Since then, various 
kinds of DNA gels have been developed. It will be a challenge to develop DNA 
gels that can shrink and swell repeatedly and thus be used as valves for opening and 
closing holes. 


3.2 Boolean Total and Non-Camouflage Gellular Automata 


The second model is much simpler than the first and more suited to implementation 
with gels. In this model, gel walls also separate cells of solutions, but communication 
between them is realized by the diffusion of signal molecules, as implemented by 
Abe et al. [23]. 

With such an implementation in mind, we have defined this model as a limited class 
of asynchronous cellular automata [29, 30]. Each cell undergoes a state transition 
only if there are neighboring cells in specified states (Boolean totality). However, 
each cell cannot recognize a cell in the same state as itself (non-camouflage). A 
transition rule can be represented in the following form: 

S — T (P and Q and not R and ...). 
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which means that a cell in state § will transition to state T if there is a neighboring 
cell in state P and a neighbor in state Q but no neighbor in state R; due to being 
non-camouflaged, S must be different from P, Q, and R. 

In addition, state transitions are asynchronous in the sense that each cell may or 
may not make a transition at each instance of discrete time. 

Note that the above restrictions occur naturally in implementations of cells by 
solutions separated by gel walls. Each cell is assumed to transmit a signal corre- 
sponding to its current state, and therefore, a cell cannot see a neighbor in the same 
state. Due to Boolean totality, it is not necessary to control the direction of diffusion 
or consider the number of neighboring cells transmitting the same signal. 

This second model of gellular automata also has sufficient computational powers. 
Turing computability was demonstrated [29, 30] as was the fact that the model can 
simulate population protocols [31]. This means that the distributed algorithms that 
population protocols can express can also be realized in the model. 

Self-stability is an important property of distributed algorithms; it is essential 
for smart materials. If a material is implemented by gellular automata that realize 
a self-stable distributed algorithm, it will eventually reach a desired configuration 
from any initial configuration. Here, a configuration means a global state of the 
gellular automata, i.e., a mapping of cells to states. Even if an external disturbance 
damages the material, i.e., changes the states of some of its cells, it can self-repair 
from the damaged configuration. Therefore, we tried to show the self-stability of 
various algorithms realized in the model. 

Self-stability consists of safety and reachability. Reachability means that it is 
possible to reach a desired configuration from any initial configuration; this is usually 
demonstrated under the assumption of fair scheduling. Safety means staying in a 
desired configuration forever once it has been reached. 

We took the following approach to demonstrating self-stability. First, the condi- 
tions for desired configurations are expressed by the local conditions of each cell. 
Next, we show that state transitions of cells in a desired configuration preserve the 
local conditions; this derives safety. Furthermore, we showed that, with fairness, 
a desired configuration that satisfies the local conditions is always achieved; this 
derives reachability. Taking the above approach, we have proved the self-stability 
of the gellular automata for solving mazes (Fig. 2), two-distance coloring, and the 
formation of spanning trees [32-34]. 

We are currently working on 3D gellular automata. In our first attempt, we formu- 
lated an algorithm for solving maze problems in 3D. Then, we extended it to solve 
matching problems in which paths are formed between pairs of cells in 3D (Fig. 3). 
It is hoped that such paths will be able to transport substances or transmit informa- 
tion in smart materials. Our next target is to realize more intelligent behaviors by 
extending paths to circuits. This target led us to introduce our third model of gellular 
automata. 
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Fig. 2 Self-stable gellular automata solve a maze problem. The orange cell (bottom right) is the 
start, and the blue cell is the goal. Black cells are walls. First, green cells spread from the goal. 
When they reach the start, red cells with different levels spread from it. Eventually, a single path 
consisting of red cells forms from the start to the goal. External disturbances can change the states 
of some cells. For example, some non-black cells may be changed to black ones. Provided that there 
is one start and one goal, a single path from the start to the goal forms again 


3.3. Three-Dimensional Gellular Automata That Learn 
Boolean Circuits 


Recently, we formulated a 3D model of cellular automata that forms Boolean circuits 
from input-output sets. In this problem, Boolean input signals (0 or 1) are given to 
some cells (input nodes) and their states are changed. Expected Boolean output 
signals (0 or 1) are also given to some cells (output nodes). This forms one input- 
output set. More input-output sets are given to the model successively and repeatedly. 
For each set, a circuit gradually forms that produces the expected output signals from 
the input signals. In this way, supervised learning is realized as a kind of pattern 
formation. If smart materials gain such an ability, they can learn by signals from 
their external environment. 

A cell is placed at each lattice point in a 3D space. Each cell therefore has six 
neighbors in its von Neumann neighborhood. Some cells are specified as input or 
output nodes. Other cells are either active or inactive; if a cell is active, it works as an 
OR gate. Signals are fed to input nodes and transmitted from these to output nodes 
along the OR gates in the direction of increasing coordinates. This means that from 
a cell at (x, y, z), a signal can transmit to the cells at (x + 1, y, z), (x, y + 1, z), and 
(x, y, z + 1). For example, if OR gates are placed at (x, y, z) and (x + 1, y, z), then a 
signal transmits from the former to the latter. Note that this directionality prohibits 
loops of signals and makes the construction of circuits tractable. 

However, in this way, we have abandoned Boolean totality in this model: The 
signals are transmitted from input nodes only in certain directions. Thus, each cell 
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Fig. 3 Gellular automata for solving maze problems have been extended to solve matching 
problems in three dimensions. Paths are formed between Si and Ti 


is assumed to recognize each neighbor. To implement this model with gel mate- 
rials, we must provide a global effect, such as that from an electric field, to provide 
directionality, as mentioned above. 

4 Supervised Learning of Boolean Circuits 

Under our third model of gellular automata, we are currently developing an algorithm 
for constructing Boolean circuits consisting of OR gates. 


4.1 Assumption 


With given Boolean signals at the input nodes, the expected Boolean signals at the 
output nodes are specified as teacher signals. In other words, an example of supervised 
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learning consists of given signals at the input nodes and expected signals at the output 
nodes. With each set of these, cells make state transitions and form a Boolean circuit 
consisting of OR gates between the input and output nodes. 

For a cell at (x, y, z), the cells at (x— 1, y, z), (x, y— 1, z), and (x, y, z—1) are called 
its “fan-in” cells, and the cells at (x + 1, y, z), (x, y + 1, z), and (x, y, z + 1) are called 
its “fan-out” cells. 


4.2 States 


The state of each cell consists of the following components. The type can be input, 
output, OR, or none. Input and output nodes are cells of those respective types. Cells 
have type OR if they are active, working as OR gates in a circuit. They have type 
none if they are inactive and do not belong to a circuit. 

Each cell has a Boolean value (0 or 1). This component is called the value compo- 
nent or value and is determined by signals transmitted from input nodes via the OR 
nodes. The value component of an input node is set by the Boolean signal given to 
it. If the type of a cell is OR or output and it has a fan-in cell whose value is 1, then 
its value component is set to 1. If there is no such fan-in cell, it is set to 0. 

In addition, the state of each cell has a component containing information for 
fixing the type of the cell. This component is called the fix component and can be 
one of FIXO, FIX1, and none. FIXO means that the cell has the value component 1, but 
it should be 0. FIX/ means the other way around. In the case of FIX, the component 
further specifies one of the fan-in cells of the cell as described in the algorithm below. 

In summary, a state of a cell is a tuple in one of the forms (input, v), (t, v, f), and 
(t, v, FIX1, n), where v is 0 or 1; t is none, OR, or output; f is none or FIXO; and n is 
a number that specifies a fan-in cell. 


4.3 Algorithm 


The algorithm for constructing Boolean circuits is formulated as a set of state tran- 
sition rules for changing the fix component and type of each cell. These rules are 
executed for one set of input signals and expected output signals. Before fix compo- 
nents and types are changed by these rules, the value components of the input nodes 
are set according to the input signals in the example set. The values of other cells 
are set according to their current types and the values of their fan-in cells. The fix 
components of the output nodes are set according to their values and expected output 
signals. 
The transition rules are described in pseudo-code as follows. 
if the type is output then 
if the value component is | and the expected signal is 0 then 
the fix component is set to FIXO 
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if the value component is 0 and the expected signal is 1 then 
the fix component is set to FIX] 
a fan-in cell is randomly chosen and specified (*) 
if the type is OR and the fix component is none and 
the value component is | and one of its fan-out cells has FIXO then 
if there is a fan-out cell whose type is OR or output with FIXO or 
there is no fan-out cell whose type is OR or output then 
the fix component is changed to FIXO 
the type is changed to none 
if the type is none or OR and the fix component is none and 
the value component is 0 and 
specified by a fan-out cell with FIX/ then 
the fix component is changed to FIXI 
the type is changed to OR (it may already be OR) 
a fan-in cell is randomly chosen and specified (*) 

At (*), when the fix component of a cell is changed to FIXI, one of its fan-in cells 
is randomly chosen and specified. This choice greatly affects the number of steps 
taken for the algorithm to construct the expected circuit. We currently impose the 
following constraints on the chosen cell as heuristics: 


— Itis possible to reach the chosen cell from an input node with the value 1 in the 
direction with increasing coordinates. 

— All the fan-in cells of the chosen cell that are OR gates or inputs nodes have the 
value 1. 

— All the fan-out cells of the chosen cell that are OR gates or output nodes have the 
value 1. 


Note that checking these constraints requires more components in the state of 
each cell. We omit description of those components and the rules for changing them 
in this article. 

The state transitions defined above are repeated until no cell can make a transition. 
Then, we eliminate dangling branches of the formed circuit, i.e., those branches that 
do not lie between an input node and an output node. This step can also be realized 
by state transitions, but its description is omitted here. 

The above algorithm is executed for each set of input and expected output signals, 
and a Boolean circuit of OR gates is formed or modified. After execution for each 
set, the value and fix components of all cells are reset, and the next set is prepared. 
After all of the sets have been processed, the first set is processed again. This process 
is repeated until no change is made to the Boolean circuit for any of the sets. 

We had previously tried several algorithms, from which we formulated the one 
above. The proposed algorithm finds a Boolean circuit that satisfies the input—output 
sets, as in Fig. 4. 

It is possible to define self-stability over multiple sets; however, we have yet to 
prove the self-stability of the algorithm. 
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Fig. 4 Algorithm constructs a Boolean circuit composed of OR gates, four input nodes, and two 
output nodes. The input nodes are shown in blue (value 1) and cyan (value 0), and the output nodes 
are shown in magenta (value 1) and red (value 0). The OR gates are shown in green (value 1) and 
yellow (value 0). This circuit was constructed from four input—output sets, of which one is shown 


5 Concluding Remark 


In this article, we first reviewed the research efforts in DNA computing that have 
led to the implementation of cellular automata. Then, we summarized our work on 
gellular automata, focusing on self-stability. Finally, we proposed our third model 
of gellular automata that can learn Boolean circuits in 3D from sets of input-output 
signals. More work remains, including proving the self-stability of the algorithm. 

Although gellular automata are based on implementing cellular automata by reac- 
tion—diffusion systems, the approach at the molecular level is also attractive and has 
various applications. It may be possible to combine the two approaches. 

Regardless of which approach is taken, when a new concrete implementation 
method is developed, it may be necessary to update the existing models. It is also inter- 
esting to extend existing models assuming possible future implementation methods. 
One possibility is the introduction of directionality into a cellular space, as we 
assumed in our third model. Directionality can be introduced by electric or magnetic 
fields or chemical gradients. With such novel factors, communication between cells 
will be enhanced, with corresponding enrichment of cellular automata models. 
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and Petra Schwille 


Abstract Spatial organization on the atomic scale is one of the key objectives of 
nanotechnology. The development of DNA nanotechnology is a hallmark of mate- 
rial programmability in 2D and 3D, in which the large variety of available DNA 
modifications allows it to be interfaced with a number of inorganic and organic 
materials. Nature’s solution to spatiotemporal control has been the evolution of self- 
organizing protein systems capable of pattern formation through energy dissipa- 
tion. Here, we show that combining DNA origami with a minimal micron-scale 
pattern-forming system vastly expands the applicability of DNA nanotechnology, 
whether for the development of biocompatible materials or as an essential step 
toward building synthetic cells from the bottom up. We first describe the interac- 
tion of DNA origami nanostructures with model lipid membranes and introduce the 
self-organizing MinDE protein system from Escherichia coli. We then outline how 
we used DNA origami to elucidate diffusiophoresis on membranes through MinDE 
protein pattern formation. We describe how this novel biological transport mecha- 
nism can, in turn, be harnessed to pattern DNA origami nanostructures on the micron 
scale on lipid membranes. Finally, we discuss how our approach could be used to 
create the next generation of hybrid materials, through cargo delivery and multiscale 
molecular patterning capabilities. 
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1 Introduction 


The plasma membrane is arather complex organelle, formed from a bilayer composed 
mostly of a mixture of phospholipids, sphingolipids, cholesterol, and a variety of 
peripheral and integral proteins [1, 2]. Long believed to be merely a passive barrier 
between the cell and its surrounding environment, the plasma membrane plays a 
fundamental role in a number of biological processes, such as cell signaling and signal 
transduction, motility, and cellular division. While research of such phenomena is of 
high interest both on a fundamental and technological level, their complexity often 
enough requires the use of model systems that allow their interrogation in a controlled 
environment [3]. 

Supported lipid bilayers (SLBs) are one particular example of model lipid 
membranes that are of great interest, as they can not only recapitulate the hetero- 
geneous composition and lateral dynamics of plasma membranes, but can also 
be patterned in 2D and 3D. While they are traditionally formed on continuous 
planar surfaces, over the last two decades SLBs have been combined with nanos- 
tructured surfaces (Fig. la) [4-10], patterned substrates [11-13], microfabricated 
compartments [14, 15], and 3D printed complex architectures [16]. 

Beyond controlling membrane topology and topography, several methods can be 
applied to control molecular distribution and localization in and on SLBs. Lipid 
phase separation [17, 18] in membranes composed of lipids of different melting 
temperatures is itself an example of molecular demixing [19-21] (Fig. 1b) and can 
be harnessed to change the distribution of membrane-coupled proteins [22-24]. On 
the other hand, external forces can be applied to control molecular distribution: Elec- 
tric fields can be used to generate gradients of charged lipids [25, 26], while surface 
acoustic waves have been shown to result in lipid demixing and even protein trans- 
port and accumulation [27—29]. Light responsive molecules can also be exploited to 
control membrane properties—the incorporation of azobenzene groups in a lipidic 
hydrophobic moiety results in control over lipid phase separation [30-33], while 
control over protein binding and localization can be achieved by the fusion of light 
responsive membrane-binding domains to proteins of interest [33—35]. While most of 
these methods offer reversible control over membrane properties and molecular local- 
ization, their applicability can be restricted due to membrane diffusion, introduced 
perturbations, spatio- and/or temporal resolution or scalability. 

DNA-based nanostructures take advantage of the unique and inherent properties 
of the DNA double helix and are the paragon of a molecular breadboard. With its 
nanoscale addressability and diversity of possible modifications, complex molecular 
patterns can be built on the surface of DNA assemblies in 2D and 3D (Fig. 1d) [36-39]. 
It is thus tempting to employ DNA origami as a tool for patterning lipid membranes. 
Indeed, a lot of attention has been given in recent years to the development and 
investigation of membrane-active DNA nanostructures [40]. From the exploration of 
a variety of membrane-binding strategies, including different hydrophobic moieties 
[41-44], ligand-receptor type binding [43], covalent conjugation [45], and ionic 
strength [46], to the development of triggerable membrane-binding nanostructures 
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Membrane topology Molecular distribution 


k DOPC 


Protein self-organization 


Fig. 1 Molecular patterning strategies, with emphasis on model lipid membranes as mimics of 
natural membranes. a Epifluorescence photographs of a corral array (20 x 20 um) of a supported 
lipid bilayer partitioned by a microfabricated grid of chrome lines. Corrals in the center were photo- 
bleached (dark) with a circular spot highlighting how the barriers (dark lines) prevent diffusive 
mixing of lipids between corrals. b Cholesteryl-TEG modified three-point star DNA tiles pref- 
erentially bind and self-assemble on the liquid-ordered phase of phase-separated DOPC/DPPC 
supported lipid bilayers. c Myosin II action on F-actin bound to supported lipid bilayers induces 
distinct patterns on the membrane such as filament bundles (left) and linked, polar asters (right). 
d Model and AFM image of patterned DNA origami triangles combined into a hexagon shape. 
Scale bars, b, 200 nm and inset 25 nm, c, 10 um, d, 100 nm. a, adapted with permission from 
[5]. Copyright 1998 American Chemical Society, b, adapted with permission from [62]. Copyright 
2017 American Chemical Society, c, adapted with permission from [66], d, reprinted by permission 
from Springer Nature: Nature, [36], Copyright 2006 


[47, 48], channels [49-51] and membrane shaping coats [52-56], we now have a good 
understanding of how to control membrane binding and dynamics of DNA origami 
nanostructures [57-59]. For example, we know that not only the number, but also the 
position and accessibility of membrane anchors, governs the efficiency of membrane 
binding. Different strategies to attach the membrane anchor to the DNA origami 
will also result in different levels of sensitivity to variations in local electrostatics 
(Fig. 2a, b). However, patterning of DNA origami on lipid membranes has so far 
been limited to the assembly of large-scale arrays or the preferential binding of DNA 
nanostructures to different lipid phases (Fig. 1b) [42, 60-63]. 

Biological spatiotemporal organization is achieved by self-organizing protein 
systems that are capable of pattern formation and large-scale molecular transport 
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<Fig. 2 DNA origami as a tool to elucidate the physical mechanism underlying cargo trans- 
port by MinDE self-organization. a AFM image of the bare DNA origami nanostructure (20- 
helix bundle; 110 x 16 x 8 nm) deposited on mica. b Charge sensitivity of membrane-binding 
DNA origami structures depends on the used attachment strategy of membrane-anchoring moiety. 
While TEG-chol moieties inserted directly into the structure result in poor DNA origami binding 
to membranes containing negatively charged lipids, when compared to bare DOPC, the use of an 
18 nucleotide long double stranded DNA linker contributes to an efficient membrane binding in 
both conditions. Representative confocal microscopy images of the equatorial plane of giant unil- 
amellar vesicles (GUVs) incubated with a 3 nM solution of DNA nanostructures modified with two 
TEG-chol anchors. GUVs contained 0.005 mol% Atto655-DOPE for fluorescence imaging, while 
each origami structure carried three Atto488 dyes. c Examples of patterns formed by MinDE self- 
organization on planar supported lipid bilayers in vitro (left: 0.75 uM MinD, 2 uM His-MinE; right: 
1 uM MinD, 1.5 uM MinE-His). d Schematic of the MinDE self-organization mechanism and the 
synthetic membrane-anchored cargo consisting of a DNA origami nanostructure and streptavidin 
building blocks. The DNA origami nanostructure has 7 dyes at the upper facet and 42 addressable 
sites for incorporation of biotinylated oligonucleotides at the lower facet which in turn bind to lipid- 
anchored streptavidin on the SLB. Fueled by ATP hydrolysis, MinDE attach and detach to and from 
the membrane in a concerted manner. e The contrast of the patterns resulting from DNA origami 
transport by MinDE increases with increasing number of incorporated streptavidin per cargo, as 
does the size of the MinD minima. Representative time series and line plots for origami nanostruc- 
tures equipped with 2 or 42 streptavidin building blocks (1 4M MinD (30% EGFP-MinD), 1.5 yM 
MinE-His in presence of 0.1 nM origami-Cy5 with 2 or 42 biotinylated oligonucleotides, strep- 
tavidin). f MinDE self-organization induces sorting of two cargo species with distinct membrane 
footprint. Representative images and line plots for simultaneous transport of cargo-2 and cargo- 
42 by MinDE (1 uM MinD (30% EGFP-MinD), 1.5 uM MinE-His, 50 pM origami-Cy3b with 
two biotinylated oligonucleotides, and 50 pM origami-Cy5 with 42 biotinylated oligonucleotides, 
non-labeled streptavidin). g Schematic of the diffusiophoretic mechanism underlying the molecular 
transport by MinDE. MinDE reactions and diffusion generate MinDE patterns and density gradi- 
ents. The diffusive fluxes of MinD exert a frictional force, fe, on the cargo molecules that depends 
on the effective size of the cargo molecules. Scale bars a, 400 nm, b, 10 um, c-f, 50 um; a, adapted 
from [58], b, adapted with permission from [59]. Copyright 2018 American Chemical Society. d—g, 
adapted from [89] under a CC BY 4.0 license 


through energy dissipation. Much emphasis has been laid on understanding and 
harnessing complex eukaryotic protein systems based on cytoskeletal and transla- 
tional motor proteins (Fig. 1c) [64—69]. In contrast, bacterial systems have largely 
been neglected even though they are often simpler both, on a mechanistic and compo- 
sitional level [70], making them ideal targets for nanotechnological applications. One 
such example comes from the bacterium Escherichia colithe Min system. At its 
core, it consists of two proteins only, the ATPase MinD and the ATPase activating 
protein MinE, ATP as an energy source, and a lipid membrane as a reaction platform. 
The two proteins interact with each other and the lipid membrane to form patterns via 
a reaction-diffusion mechanism. This compositional simplicity paired with its rich 
dynamics made the system a paradigm for the study of biological pattern formation 
[71]. Over 30 years of research in vivo, in vitro, and in silico allowed us to obtain 
a rather detailed understanding of the underlying molecular mechanism [72-79]. 
In short, MinD binds to the membrane cooperatively upon ATP-induced dimeriza- 
tion, presumably involving the formation of higher order structures. MinE binds 
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to membrane-bound MinD stimulating its ATPase activity, which leads to protein 
detachment from the membrane (Fig. 2d). In E. coli, MinDE performs pole-to-pole 
oscillations which generate a time-averaged gradient of the passenger protein MinC 
with a minimum at midcell. As MinC is an inhibitor of the main divisome protein 
FtsZ, this time-averaged protein gradient restricts the assembly of the cell division 
machinery to the middle of the cell. In 2008, MinDE dynamics were reconstituted 
in vitro: The purified proteins MinD and MinE supplied with ATP as an energy source 
self-organized on supported lipid bilayers in aqueous buffer forming traveling surface 
waves [77]. 

Depending on the specific reaction conditions, the system has been shown to 
exhibit a variety of patterns in vitro, such as traveling surface waves, dot and 
labyrinthine patterns, and also oscillatory behavior (Fig. 2c) [12, 77, 80, 81]. Indeed, 
while the reaction system is truly nonlinear, i.e., small changes in the reaction condi- 
tions can lead to drastic changes in the reaction outcome; considerable effort has 
been invested in determining the parameters influencing pattern formation in order 
to achieve control over it. Protein concentration and ratios [77, 81], ionic strength of 
the buffer [82], molecular crowding [7, 14, 83], membrane charge and fluidity [12, 
82, 84], and in particular the reaction space geometry, including the form and size 
of the membrane as well as the surface to volume ratio, heavily influence the type 
of obtained patterns. For example, when MinDE is confined in membrane-coated 
PDMS microcompartments with an elongated, cell-like shape, they recapitulate the 
oscillatory behavior that occurs in the bacterial cells in vivo [12, 14, 78]. On planar 
rectangular membrane patches, in turn, MinDE traveling surface waves align along 
the longest axis [7]. Beyond modifications to the reaction surface, engineering of 
both, MinD and MinE, has revealed that modifications of key molecular motifs, such 
as the membrane binding and dimerization helices, can change the type of MinDE 
patterns, such that robust standing waves or MinDE traveling waves with wavelength 
on a millimeter-scale can be generated [85, 86]. These examples and many others 
found in the literature [71] clearly demonstrate that the Min system, although compo- 
sitionally simple, is a striking pattern-forming system whose behavior can be tightly 
controlled using a varied set of parameters. 

Here, we first review how DNA nanotechnology has been previously harnessed to 
elucidate physical mechanisms governing complex biological phenomena, i.e., the 
transport of molecular cargo by MinDE dynamics. Building on this work, we then 
investigate how MinDE-dependent transport of DNA origami nanostructures can be 
used to generate stable, biologically compatible patterns at the micron to millimeter- 
scale. We further speculate how combining the rich dynamics of the self-organizing 
MinDE system with the nanometer-precision addressability of DNA origami nanos- 
tructures opens up new possibilities in creating the next generation of hybrid materials 
with multiscale molecular patterning capabilities. 
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2 DNA Origami as a Tool to Elucidate Molecular 
Mechanisms 


Recently, we and others have shown that the Min system can not only form mesmer- 
izing patterns on membranes itself, but also induce patterns of other functionally 
unrelated proteins [87, 88]. These studies showed that MinDE-induced regulation of 
lipid-bound proteins that have a long membrane dwell-time results in their net trans- 
port. This suggested that the non-specific interaction of MinDE with these proteins 
modulates their diffusion on the membrane, an effect that should depend on the prop- 
erties of the transported, membrane-bound molecules, such as the area they occupy 
on the membrane, i.e., the membrane footprint, or their diffusion on the membrane. 
In order to specifically probe distinct hypotheses regarding the underlying mecha- 
nism, it was desirable to modify specific cargo properties over a wide range, while 
keeping all other parameters comparable. While the properties of membrane-attached 
proteins cannot easily be tuned in a defined fashion, DNA origami nanostructures are 
fully programmable, as illustrated across this book. Importantly, binding and diffu- 
sion of DNA origami nanostructures on model membranes have been extensively 
characterized in detail, as described above, rendering them an ideal cargo for the 
detailed study of MinDE-dependent cargo transport [89]. 

The DNA origami nanostructure employed was a 20-helix bundle with 42 address- 
able positions at the bottom facet that can be modified with cholesteryl or biotiny] 
moieties for membrane anchoring (110 x 16 x 8 nm) [58] (Fig. 2a, d). While the 
origami structures with cholesteryl oligonucleotides directly bound to the membrane, 
the ones with biotinylated oligonucleotides could be decorated with streptavidin 
which in turn bound to biotinylated lipids on the SLB. The former design allowed us 
to vary the diffusion of the structures while maintaining similar membrane footprints; 
the latter was used to vary the membrane footprint and thus, the effective size of the 
molecule via the amount of incorporated streptavidin molecules. For visualization by 
fluorescence microscopy, the DNA origami nanostructures were functionalized with 
seven dyes on the upper facet. In order to test the redistribution of these structures in a 
controlled manner, we used conditions under which MinDE formed quasi-stationary 
patterns, i.e., after an initial chaotic self-organization phase, labyrinthine patterns 
emerge, whose macroscopic appearance is stable over long periods of time, but 
which are nevertheless maintained by continuous binding and unbinding of the indi- 
vidual proteins. We found that MinDE was also capable of redistributing these rather 
large structures (compared to the 5 nm-sized MinDE proteins [90] and the previously 
investigated membrane-bound proteins [87, 88]) leading to an anti-correlated pattern 
of the DNA origami on the membrane. Comparing the final quasi-stationary patterns 
for structures bearing different kinds and numbers of membrane anchors, we were 
able to show that the extent of the molecular transport by MinDE increased with 
the membrane footprint of the cargo molecule, i.e., the effective size of the cargo 
molecule on the membrane surface (Fig. 2e), but did not directly correlate with cargo 
diffusion. In contrast, in experiments with conditions under which MinDE formed 
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traveling surface waves, we found that both, the cargo’s effective size and its diffu- 
sion coefficient, play a role in its transport efficiency. Cargo molecules that are too 
slow in comparison to the wave propagation will not be transported effectively. Even 
more intriguingly, we showed that MinDE self-organization was capable of spatially 
sorting DNA origami nanostructures with distinct effective sizes, i.e., one with 2 and 
one with 42 streptavidin molecules bound to its bottom surface (Fig. 2f). 

Our experiments allowed us to rule out several possible physical mechanisms 
arriving at diffusiophoresis as the simplest one that could fully explain our obser- 
vations (Fig. 2g). Diffusiophoresis generally describes the transport of particles in 
fluids along concentration gradients of small solutes and has mostly been reported in 
a non-biological context such as colloid transport [91, 92]. During MinDE self- 
organization, density gradients of the proteins on the membrane are established 
which lead to diffusive protein fluxes on the membrane toward low protein densi- 
ties. Due to the high density of proteins on the membrane, MinDE proteins directly 
interact with other molecules on the membrane, generating a frictional force on these 
“cargo” molecules. As a result, the diffusive fluxes of MinDE and cargo couple on 
the membrane, leading to their transport toward, and subsequent accumulation in, 
areas of low MinDE density. As the frictional force increases with the effective size 
of the molecule, molecules with a larger membrane footprint experience a stronger 
redistribution than smaller ones. Although this first description of diffusiophoresis by 
a biological self-organizing system was achieved in vitro, diffusiophoresis could be 
more widespread in cellular systems, but might be masked by the stronger, specific 
molecular interactions within the cell. 


3 Stable DNA Origami Patterns on Lipid Membranes 


Harnessing the established experimental framework, we set out to explore the possi- 
bilities arising from the ability to pattern DNA origami nanostructures by MinDE 
self-organization. In this manuscript, we started by testing whether the resulting 
patterns could be stabilized after they had been established. DNA origami distribu- 
tion out of equilibrium is maintained by the energy dissipation upon MinDE self- 
organization and is as such reversible: DNA origami patterns respond to changes 
in MinDE patterns induced, for example, by the addition of more MinE [89] and 
DNA origami patterns disappear altogether by thermal mixing when MinDE activity 
subsides [87, 88]. This reversibility may become inconvenient for any downstream 
applications, considering the dynamic nature of lipid membranes. Thus, it would 
be desirable to “freeze” MinDE-induced cargo patterns, e.g., through crosslinking. 
Previous work has described two main strategies for hierarchical self-assembly of 
large DNA-based structures (for a more detailed review, refer to [93]): “sticky” inter- 
actions [94-96], based on DNA base-pairing and “stacking” interactions [97-99], 
based on blunt-end interactions and shape complementarity of building blocks. While 
the first one can be accomplished by programming complementary single-stranded 
DNA extensions between building blocks or by addition of a single-stranded DNA 
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oligonucleotide complementary to single-stranded portions on the DNA structures, 
the second one can be controlled by the concentration of Mg”* in solution. As MinDE 
self-organization is sensitive to the ionic strength of the buffer [82], we here opted to 
modify our DNA origami nanostructures to contain 8 nucleotide long poly-A ssDNA 
extensions on each end. The addition of a complementary 14 nucleotide long poly-T 
polymerization staple would result in the crosslinking of DNA nanostructures and 
their polymerization in an end-to-end fashion. 

First, we verified the behavior of a priori polymerized DNA origami nanostruc- 
tures upon MinDE self-organization (Fig. 3a). A homogeneous, membrane-bound 
layer of DNA origami was crosslinked before MinDE self-organization was started 
with ATP. Upon ATP addition, MinDE self-organized into labyrinthian patterns, as 
did the excess membrane-bound streptavidin. However, polymerized DNA origami 
did not get patterned, remaining homogeneously distributed. This shows that the 
DNA origami nanostructures indeed formed large stable polymers on the membrane 
whose diffusional behavior was virtually frozen and as such could not be transported 
by MinDE, as the diffusiophoretic transport requires diffusive mobility of both the 
cargo and MinDE proteins. 

In contrast, when we started MinDE self-organization in presence of these DNA 
origami structures prior to the addition of polymerization staples, MinDE dynamics 
induced anti-correlated DNA origami patterns similar to those observed for non- 
modified DNA origami nanostructures (Fig. 3b). After MinDE patterns entered the 
quasi-stationary phase, we added the polymerization staple to “freeze” the patterns 
in place. Indeed, when we added more Mink, only the MinDE patterns changed, 
whereas the DNA origami retained their original pattern. Intriguingly, we could 
observe the uncorrelated patterning of two cargo molecules: While the crosslinked 
DNA origami retained the original pattern, the excess membrane-anchored protein 
streptavidin, here used to bind DNA origami to the membrane, was transported by the 
reorganized MinDE patterns. In the control experiment, in which the polymerization 
staple was added to a DNA origami structure that cannot be crosslinked (i.e., without 
single-stranded overhangs), the addition of MinE resulted in concomitant and anti- 
correlated pattern reorganization of MinDE and DNA origami (Fig. 3c). 

Having shown that MinDE-induced DNA origami patterns could be spatially 
stabilized, we next asked whether other molecules could be transported or targeted 
using DNA origami as a shuttle or scaffold. Over the years, many studies have 
shown that molecules can be targeted to DNA origami nanostructures with nanometer 
precision via a wide variety of different chemistries [100]. As a proof of concept, we 
used a DNA origami structure that contains 15 cholesteryl moieties for membrane 
binding and which is strongly redistributed by MinDE self-organization [89]. We 
replaced the fluorescent dye staples on the upper facet of these structures with biotinyl 
moieties which allowed the coupling of fluorescently labeled streptavidin to the 
upper facet of the DNA origami (Fig. 3d). Indeed, MinDE was also capable of 
transporting this streptavidin-loaded DNA origami, resulting in high-contrast, anti- 
correlated patterns (Fig. 3e). When adding biotinylated stabilized actin filaments to 
such pre-patterned streptavidin-bearing DNA origami structures, actin preferentially 
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bound to the regions enriched in DNA origami, i.e., the MinDE minima. Moreover, 
the presence of multiple binding sites per filament resulted in pattern crosslinking. 

Taken together, the new results presented herein are a proof of concept of how 
DNA origami, when combined with the MinDE system, can be used to create stable 
and spatially uncorrelated patterns of distinct molecules on a dynamic support, such 
as a lipid membrane. 


4 Challenges and Opportunities 


Since the birth of DNA nanotechnology, the biophysical community has been 
exploiting the well understood behavior of DNA and its versatility regarding conjuga- 
tion/modification [100] to interrogate a number of fundamental biological processes 
[101]. From single molecule methods to cellular studies, researchers used DNA 
origami to investigate enzymatic cascades [102-104], motor proteins [105, 106], 
DNA binding proteins [107—109], and even receptor-mediated cellular responses, 
e.g., apoptosis, phagocytosis, B cell activation, T cell stimulation [110-114], to 
name only a few. Here, we reviewed how our detailed knowledge and fine control 
over membrane binding and diffusion of DNA origami nanostructures allowed us to 
unravel a new unspecific transport mechanism by the self-organizing MinDE system 
of E. coli [89]. Our recent results reiterate how the power of DNA origami can be 
used to interrogate complex dynamic phenomena in a controlled fashion and further 
expand the available in vitro toolkit. 

Non-specific transport based on diffusiophoresis should in principle occur in any 
system that generates and/or maintains concentration gradients. We have shown that 
(membrane-bound) DNA origami is the ideal tool to characterize such phenomena 
in detail. For example, similar approaches could be used to explore diffusiopheritc 
transport or related effects in other self-organizing membrane systems such as small 
GTPases [115], membrane-bound kinase/phosphatase networks [10], or minimal 
actin cortices [64—66]. Soluble DNA origami structures might even be harnessed to 
study incorporation and molecular transport that occur during liquid—liquid phase 
separation of biomolecules, a phenomena that recently gained considerable attention 
in cell biology [116]. With the routine establishment of uncorrelated patterns, the 
complexity of such assays can be increased, allowing to explore the interplay between 
simultaneously occurring dynamics of distinct systems, as observed in cells. 

To date, in the quest for biomolecular spatiotemporal patterning and trans- 
port, the focus has mostly been laid on using eukaryotic active matter elements 
on solid supports, which mediate cargo transport via specific interactions, such 
as the ParMRC system, actin, microtubules, and motor proteins [67—69]. Despite 
not being a “conventional” biological motor system, diffusiophoretic cargo trans- 
port by MinDE self-organization has the potential to enable complex biomolecular 
patterning on membranes with particularly high modularity when coupled to DNA 
origami, the model tool of molecular nanopatterning [117]. The variety of modifi- 
cation chemistries for membrane binding or molecular targeting, crosslinking and 
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actuation strategies of DNA origami, paired with the diversity of possible patterns 
and various control elements for the Min system, as well as the ease of purification 
of the MinDE proteins (small proteins as compared to eukaryotic molecular motors) 
and the non-specific nature of the MinDE-mediated transport that overcomes the 
need for additional adapter proteins, are all factors that contribute to making this an 
attractive combination for a variety of applications (Fig. 4a). 
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opens up new possibilities for molecular transport and patterning. a Schematic highlighting 
the variety of control elements available for self-organizing protein systems such as MinDE and 
DNA origami nanostructure toolboxes. b MinDE accumulates DNA origami structures along the 
long axis of patterned lipid membranes. Representative time series of MinDE traveling surface 
waves transporting DNA origami along the wave vector on chromium-patterned SLBs. c Large- 
scale gradients of cargo molecules established by a minimal MinDE system (indicated by white 
arrows). Representative images of MinE(1-31)-msfGFP and MinD self-organization in the pres- 
ence of membrane-bound streptavidin (2.5 yM MinD (20% Alexa647-MinD), 200 nM MinE(1- 
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preferential localization of the cargo molecules toward the center of the compartment. Represen- 
tative time-lapse images and time-averaged fluorescence intensity profile of MinDE oscillations 
and streptavidin counter-oscillations in PDMS microcompartments (1 4M MinD, 2 uM Mink, 
streptavidin-Alexa647). e Schematic highlighting the potential of generating two distinct time- 
averaged gradients of DNA origami in cell-shaped compartments simultaneously. b was adapted 
from [89] and d from [87] under a CC BY 4.0 license. Scale bars b, 25 um, c, | mm, d, 10 pm 
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For example, by combining the ability to direct MinDE traveling waves on planar 
patterned bilayers [7] with DNA origami, reproducible directed molecular transport 
can be achieved, resulting in a gradient of the cargo of interest along the longest axis 
of a membrane patch [89] (Fig. 4b). On the other hand, we show here that MinDE- 
dependent net transport of molecules can occur over several millimeters and establish 
millimeter-sized gradients on the membrane (Fig. 4c), when MinE mutant peptides 
are employed [85]. As such, arbitrary cargo tethered to a DNA origami shuttle could 
be transported directionally over several millimeters and could even be controlled by 
light [33]. By harnessing the current technologies of microfabrication and membrane- 
coating of surfaces (see Introduction for more detail), exciting possibilities arise for 
macromolecular “writing” at a scale visible to the naked eye, resulting in a new 
generation of biohybrid devices. Hence, directed and controlled diffusiophoretic 
transport of DNA origami structures by MinDE could potentially be explored for 
molecular patterning, molecular delivery, biocomputation, or molecular separation 
based on membrane footprint. 

Besides its potential application in nanotechnology, DNA origami transport 
by MinDE also offers new possibilities to achieve spatiotemporal organization 
in synthetic cells. We have shown in the past that MinDE self-organization can 
spatiotemporally position molecules via diffusiophoresis to the center of cell-shaped 
compartments [87, 88] (Fig. 4d). Using DNA origami structures with distinct 
membrane footprints, it should now be possible to target distinct molecules to at 
least three different regions of such a cell-like compartment (Fig. 4e). Furthermore, 
the concentration contrast between these regions can be controlled by adjusting the 
differences between DNA origami footprints. Combined with the recent strides that 
have been made to encapsulate a functional Min system into 3D compartments, such 
as monolayer sealed microfabricated compartments [15], water-in-oil droplets [118] 
or deformable giant unilamellar vesicles (GUV) [119, 120], DNA origami trans- 
port by MinDE could then be exploited for basic spatial compartmentalization of 
synthetic cells, e.g., to tether and segregate genetic material or compartmentalize the 
activity of molecular machines. Using functional DNA origami nanostructures that 
can be actuated by a number of mechanisms, e.g., strand displacement or insertion, 
pH, temperature, electric field, or light [121-127], even a functional divisome could 
potentially be generated, a long-standing goal of the field. 

In conclusion, the versatility of DNA nanotechnology and the rich dynamics 
of self-organizing proteins such as the MinDE system can be harnessed together 
to transport, deliver, pattern, and sort functionalized cargo on membranes, thereby 
creating a new class of hybrid biomaterials for applications in nanotechnology and 
the bottom-up construction of synthetic cells. 
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5 Materials and Methods 


Most of the experimental methods and materials for this chapter have been described 
in detail before [58, 89, 128], but are described in brief below, highlighting 
modifications and new methods. 

MinDE plasmids and proteins—the plasmids pET28a-His-MinD_MinE and 
pET28a-His-MinE [77], pET28a-His-EGFP-MinD [11], pET28a-MinE-His [11, 
77, 80, 81] and pET28a-MinE(1-31)-msfGFP [85] were used for purification 
of His-MinD, His-EGFP-MinD, His-MinE, MinE-His, and MinE(1—31)-msfGFP, 
respectively, as described in detail before [128]. 

DNA origami nanostructures—the modifications and the functionalization of the 
previously designed elongated DNA origami nanostructure [58] have been described 
in detail in [89]. For polymerizable DNA origami structures, end staples with 8A 
extensions are added upon DNA origami folding. Biotin-top-labeled DNA origami 
structures contained 7 modified oligonucleotides attached to extended staples on 
the upper facet. The assembly of the origami structure was performed in a one-pot 
reaction mix as described previously [58]. 

Self-organization assay on SLBs—SLBs were prepared as described before with 
a lipid composition of 30 mol% DOPG, 69 or 70% DOPC, and 1% CAP-Biotinyl- 
PEG in case of DNA origami with biotinyl moieties [128]. Binding of DNA origami 
to lipid membranes via streptavidin interactions or cholesteryl anchors has been 
previously described [89]. Self-organization assays were performed essentially as 
detailed in [128]. For large-scale MinDE traveling waves a MinE peptide [85], 
200 nM MinE(1-31)-msfGFP, and 2.5 uM MinD (20% Alexa647-MinD) were used 
in the self-organization assay that was performed in sticky-Slide VI 0.4 chambers 
(ibidi GmbH, Gräfelfing, Germany). 

Crosslinking of DNA origami by strand hybridization—to trigger origami 
crosslinking, polymerization staple (14 T) was added to the chamber at a final concen- 
tration of 17 uM, and the reaction mixture was mixed by pipetting. The large volume 
addition (40 u1) induced changes to MinDE patterns in some experiments. To further 
change the MinDE pattern, 1.5 yM MinE-His was added to the reaction mixture. 

Attachment of actin to DNA origami—for attachment of labeled streptavidin to 
DNA origami, cholesteryl-modified DNA nanostructures were first bound to the 
membrane. Subsequently, the chamber was incubated with Alexa568-labeled strep- 
tavidin (ThermoFisher Scientific) at a final concentration of 1 g/ml. After 5—10 min 
incubation, unbound streptavidin was removed by gently washing 3 times with a total 
volume of 600 ul reaction buffer. MinDE proteins were added to the reaction mixture, 
and the self-organization assay was started by addition of ATP. After pattern estab- 
lishment, Alexa-647-Phalloidin-labeled biotinylated actin (produced as described in 
[65]) was added at a final concentration of 0.1 yM and allowed to bind to the DNA 
origami before image acquisition. 
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